• Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Skip to footer

Electrical Engineering News and Products

Electronics Engineering Resources, Articles, Forums, Tear Down Videos and Technical Electronics How-To's

  • Products / Components
    • Analog ICs
    • Battery Power
    • Connectors
    • Microcontrollers
    • Power Electronics
    • Sensors
    • Test and Measurement
    • Wire / Cable
  • Applications
    • 5G
    • Automotive/Transportation
    • EV Engineering
    • Industrial
    • IoT
    • Medical
    • Telecommunications
    • Wearables
    • Wireless
  • Learn
    • eBooks / Handbooks
    • EE Training Days
    • Tutorials
    • Learning Center
    • Tech Toolboxes
    • Webinars & Digital Events
  • Resources
    • White Papers
    • Design Guide Library
    • Digital Issues
    • Engineering Diversity & Inclusion
    • LEAP Awards
    • Podcasts
    • DesignFast
  • Videos
    • EE Videos and Interviews
    • Teardown Videos
  • EE Forums
    • EDABoard.com
    • Electro-Tech-Online.com
  • Bill’s Blogs
  • Advertise
  • Subscribe

Beyond Voice Recognition: Giving Products the Sense of Hearing

March 16, 2018 By Tom Grainge, Audio Analytic

Teaching Alexa to hear would enable it to recognize a baby crying, an intruder breaking in, or a life-threatening accident. Image: Audio Analytics.

The smart home category is buoyant. Currently on the forefront in terms of public profile are voice activated smart home assistants and smart speakers. Apple, Google and their peers are ramping up activity in this growing sector. Beyond experiments with a range of form factors, price points and integrations, these manufacturers are striving to make their products more useful to consumers through artificial intelligence. As speech, music, and visual recognition have taken giant strides in recent years, the missing link to intelligent response remains sound recognition.

Without the ability to recognise sounds in their environment the current range of devices require human interaction; via voice, smart phone or tablet, to enable them to perform  tasks. To be capable of reacting to an event without human interaction, these devices need improved contextual awareness. Sounds give key contextual cues that are difficult to obtain otherwise. For example, you could hear and easily recognise the sound of a window breaking from another room without seeing it. So, just like the human brain, if the smart home is to be truly intelligent it must be able to recognise and respond to audible events. Augmenting voice recognition with audio event detection is a natural route to deliver contextual intelligence to home assistant systems and the technology can be applied to many other devices containing microphones.

How do you teach tech to hear?  

We have developed a treatment of sound that makes it describable and hence ready to be processed through machine learning systems.

Unlike voice recognition, which is a mature technology, no suitable data set existed that could enable us to train a machine learning system at the sensitivity and accuracy that we were aiming for, so we had to build a dataset. That meant recording thousands of sounds ourselves in order to obtain quality, real-world data that we could expose our technology to. This involved smashing thousands of windows, recording countless dog barks and listening to hours of babies crying. We continue to collect data from thousands of fresh audio recordings to build and continually enrich our data platform and expand the range of sounds we have within our rapidly growing proprietary taxonomy.

For machine learning to be able to process the sound data systematically, we had to describe it and organise it. A key difference between sound recognition and speech recognition is that speech is limited by the type of sounds that the human mouth can produce, and follows a set structure that makes it possible to map. These pre-defined rules and characteristics of speech make it easier to process in the structured, repeatable fashion that CPUs are designed for.

Similarly, music mostly results from physical resonance, and is conditioned by the rules of various musical genres, so has boundaries within which it is readily analysable.

Sound is different. Sound is much more diverse, unbounded and unstructured than speech and music. Think about a window being smashed, and all the different ways glass shards can hit the floor randomly, without any particular intent or style. Understanding the full extent of sound variability is a pre-requisite in order to map its characteristics, so that a machine is able to process it next time it encounters it. To overcome this challenge, we had to develop a “language” of ideophones, which describe an idea in sound or sensory perception.

The framework we developed extracts hundreds of ideophonic features from each sound. Our ideophonic feature extractor is composed of a set of audio signal descriptors that are computed from the incoming audio signal, which can be combined in such a way to approximate an ideophone or a number of ideophones. Through encoding, decoding and introspection the feature extractor is then able to isolate hundreds of characteristics from sounds and describe sound “profiles” which uniquely identify that sound. These individual profiles are embeddable into our software sound sensor platform, ai3™, which can be integrated into virtually any consumer device equipped with a microphone.

Image: Audio Analytic

Cloudless, connectionless intelligence

The ubiquity of microphones spurred by the spread of speech recognition functionality means that there is a huge opportunity to add intelligence to products through sound recognition. Sound recognition dramatically expands the range of possibilities for AI enabled devices. Home security is an obvious example, as wake words do not exist in real life situations: burglars breaking into your house will for example never announce themselves by calling out a trigger word before trespassing.

Embedding the sound sensor as software means that all sound identification, analysis and decision making is done locally and instantly, providing definitive detection of target sounds on device, enabling an immediate response to be triggered. Independence from the cloud makes the technology reliable in a wider range of situations than if it required internet connectivity.

In addition this edge intelligence means that streaming sound to cloud analysis resources is not required. This offers the opportunity to simplify designs and reduce power and BOM budgets. Eliminating the need to stream audio to the cloud also helps address consumer concerns over the privacy implications of devices with always on microphones.

Managing microphone to processor pathways – recommended approaches for Linux based systems

Having access to microphones on the device is th start of the process of integrating sound detection. Our experience in assisting with customer integrations is that obtaining good quality audio from the microphones typically embedded in consumer devices can, initially, appear difficult. However minor alterations to positioning, port design and audio processing can be applied to resolve these issues.

Managing audio processing, without creating conflicting demand with other device functions, requires thought as the signal path chosen can dramatically affect how well the product performs, or in our particular case, the quality of its listening.

Smart products need to be capable of simultaneously executing multiple tasks in order to be useful. Inappropriately being taken over by the demands of a single function significantly degrades user experience. Increasingly, equipment designers are seeking ways to transform low power, single-purpose devices into low power, multi-purpose devices, connected to the rest of the smart home ecosystem. Consumers are also looking for ways to make traditionally “dumb” unconnected devices such as smoke alarms smarter, but without incurring the cost of replacing existing units with more expensive, connected versions. Using sound recognition to deliver such extended functionality requires the embeddable sound recognition software be lightweight enough to allow the device to continue performing its other functions, rather than be consumed by sound recognition. Additionally, the software design must include a flexible audio path, that converts audio as desired, but which also gives multiple device functions access to the audio stream.

 

Many of the currently available consumer devices run Linux based operating systems. There are a host of advantages to running a Linux based system for engineering teams, however, Linux audio interfacing is currently limited to a small set of tools. Achieving the necessary control over the signal path can become an issue in these conditions, so choosing the right approach is critical.

The min tool options for passing audio data from audio hardware to application processor are ALSA (Advanced Linux Sound Architecture) and PulseAudio. The two offer similar functionalities, but operate in different manners.

Out of the two, PulseAudio is the easiest to understand and grapple with at first, but its configuration to obtain control over the signal path can prove to be problematic.

Despite its complex user interface, ALSA tends to be more reliable than PulseAudio in providing audio to all the applications on a smart device, without taking it over completely. While it is more intuitive, PulseAudio is typically difficult to adjust to achieve a precise setup, whereas ALSA allows every setting to be configured manually, making it a more flexible tool. 

The code snippet below is an example recording setup using ALSA, which can be used to provide audio to multiple applications simultaneously. By adopting this approach it becomes possible to have different conversions of the same stream, from the same microphone input.

Solving the challenge of giving devices the sense of hearing is complex, but cracking it is necessary to reach the next frontier of artificial intelligence in mainstream devices. 

//Box out 

Example ALSA device configuration to provide audio simultaneously to multiple applications. 

pcm_slave.hw_input {

   pcm “hw:0,1” # use card 1 and device 1

   channels 2  # set the number of channels channel

   rate 48000  # set the native rate of the hardware

   buffer_size 4096 # This is optional

}

 

pcm.input_share {

   type dsnoop

   hint {

       description “interface for sharing hardware to multiple clients”

   }

   ipc_key 345271 # unique plugin identifier

   ipc_key_add_uid yes

   slave hw_input # use the hardware input as slave

   bindings.0 0

   bindings.1 1

}

 

pcm.input_left {

   type plug

   hint {

       description “left input plug, uses shared input”

   }

   slave.pcm input_share

   rate_converter samplerate_best # define which rate converter

 

   route_policy copy

   # route only audio from c0 to c0

   # ttable.in.out gain

   ttable.0.0 1

   ttable.1.0 0

}

 

pcm.input_right {

   type plug

   hint {

       description “right input plug, uses shared input”

   }

   slave.pcm input_share

   rate_converter samplerate_best # define which rate converter

 

   route_policy copy

   # only route audio from channel 1 to channel 0

   # ttable.in.out gain

   ttable.0.0 0

   ttable.1.0 1

}

 

pcm.input_stereo {

   type plug

   hint {

           description “flexible input plug”

   }

   slave.pcm input_share # Use the shared input device as slave

   rate_converter samplerate_best

}

Ends.

 

Tom Grainge

Thomas is a software engineer and resident audio expert at sound recognition pioneer Audio Analytic. Based in Cambridge, UK, Tom’s role sees him develop the company’s core software platform, ai3™, as well as working closely with customers to deploy this context-sensing AI technology on consumer electronics.

You Might Also Like

Filed Under: Robotics/Drones

Primary Sidebar

EE Engineering Training Days

engineering

Featured Contributions

Meeting demand for hidden wearables via Schottky rectifiers

GaN reliability milestones break through the silicon ceiling

From extreme to mainstream: how industrial connectors are evolving to meet today’s harsh demands

The case for vehicle 48 V power systems

Fire prevention through the Internet

More Featured Contributions

EE Tech Toolbox

“ee
Tech Toolbox: Internet of Things
Explore practical strategies for minimizing attack surfaces, managing memory efficiently, and securing firmware. Download now to ensure your IoT implementations remain secure, efficient, and future-ready.

EE Learning Center

EE Learning Center
“ee
EXPAND YOUR KNOWLEDGE AND STAY CONNECTED
Get the latest info on technologies, tools and strategies for EE professionals.
“bills

R&D World Podcasts

R&D 100 Episode 10
See More >

Sponsored Content

Advanced Embedded Systems Debug with Jitter and Real-Time Eye Analysis

Connectors Enabling the Evolution of AR/VR/MR Devices

Award-Winning Thermal Management for 5G Designs

Making Rugged and Reliable Connections

Omron’s systematic approach to a better PCB connector

Looking for an Excellent Resource on RF & Microwave Power Measurements? Read This eBook

More Sponsored Content >>

RSS Current EDABoard.com discussions

  • Question LCD LED IPS display
  • How can I get the frequency please help!
  • Fuel Gauge IC for Primary Cell Monitoring
  • differential amplifier with bjt
  • What is the purpose of the diode from gate to GND in normal Colpitts oscillator Circuits?

RSS Current Electro-Tech-Online.com Discussions

  • 100uF bypass Caps?
  • Fuel Auto Shutoff
  • Actin group needed for effective PCB software tutorials
  • how to work on pcbs that are thick
  • compatible eth ports for laptop
Search Millions of Parts from Thousands of Suppliers.

Search Now!
design fast globle

Footer

EE World Online

EE WORLD ONLINE NETWORK

  • 5G Technology World
  • Analog IC Tips
  • Battery Power Tips
  • Connector Tips
  • DesignFast
  • EDABoard Forums
  • Electro-Tech-Online Forums
  • Engineer's Garage
  • EV Engineering
  • Microcontroller Tips
  • Power Electronic Tips
  • Sensor Tips
  • Test and Measurement Tips

EE WORLD ONLINE

  • Subscribe to our newsletter
  • Teardown Videos
  • Advertise with us
  • Contact us
  • About Us

Copyright © 2025 · WTWH Media LLC and its licensors. All rights reserved.
The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of WTWH Media.

Privacy Policy