Audio Analytics’ article “Beyond Voice Recognition: Giving Products a Sense of Hearing”, that described a novel method for performing sound recognition in embedded applications, drew a lot of attention when we posted it last week. It also got several comments from readers, including one from George Kafantaris. His alternative perspective on AI was informative enough that it deserved more visibility so we’ve popped it out as a short posting of its own:
Voice recognition technology can revolutionize our lives in areas that have nothing to do with human speech. We can now have sound systems that can recognize footsteps from afar and alert us of approaching intruders, thus eliminating the need for tall fences — maybe even that notorious Border Wall. Systems that can alert us to noises from malfunctioning machinery in factories, ships, trains or airplanes. Systems that mind tell-tale noises from the electric grid, or mind enemy troop movements in the battlefield. Systems that can compliment building maintenance and security — or listen for specific noises — even from the weather — no matter how overwhelming the surroundings sounds might be.
But how can a voice recognition system do this? The man to ask is Pete Warden who is developing cheap gadgets that he wants to put everywhere — “listening for noises rather than voices — hundreds of sensors spotting tell-tale audio signatures of squeaking wheels in factory equipment, or chirping crickets in a farm field.”
Here is how he does it. He “takes an audio clip, slices it into short snippets, and then calculates the frequency content of each one. He lines up each of the frequency plots one after the other to create a 2-D image of frequency content versus time, and applies visual-recognition algorithms to identify the distinctive signature of someone saying a single word [or the distinctive signature of a single sound.”
Mr. Kafantaris cited a recent article in Technology Review that provides some useful details about this approach to machine hearing – “Disposable Voice Recognition Take Cheap Chips and Add Simple AI”. It’s a fascinating read that describes an apporach that could make basic speech/sound recognition almost as inexpensive as the speech synthesis engines currently used by so many consumer goods.
