By Dr. Moti Margalit, CEO of SonicEdge
The rise of physical AI, intelligent systems that sense, interpret, and interact with the real world, is driving a fundamental shift in how electronics are designed. This includes audio devices. ‘Always-on’ sensors, voice interfaces, and ambient intelligence are no longer aspirational features; they are now baseline requirements for the next generation of consumer electronics. From wearables and hearables to smart home devices and automotive cabins, products are expected to listen, respond, and communicate seamlessly with their users and the surrounding environment.
On the input side, MEMS (Micro-Electro-Mechanical-Systems) microphones have already transformed how devices capture sound – replacing bulky, archaic electret capsules with tiny, high-performance silicon components that are now standard in virtually every smartphone, earbud, and smart speaker on the market today. The same transformation is now underway on the output side. The demand for compact, high-fidelity speakers, capable of delivering sound without the physical presence of traditional drivers, is accelerating across product categories: wearables that need full-range audio from invisible components, smart home devices where sound must be heard but the hardware unseen, and automotive interiors where speakers consume space and add weight that designers can no longer afford.
MEMS speakers are at the forefront of this shift, poised to do for sound output what MEMS microphones did for sound input: enable a new class of audio components that are smaller, more efficient, and natively compatible with semiconductor manufacturing. However, not all MEMS speaker architectures are created equal. As the technology matures, engineers and product developers need to understand the capabilities and limitations of the options available before committing to a platform.
This article explores the key technical considerations when selecting a MEMS speaker and examines how modulated ultrasound technology is expanding what these components can achieve.
First-generation MEMS speakers: tweeters
The first MEMS speakers to reach the market were effectively ‘tweeters’ – drivers optimized for high-frequency reproduction, typically above 2 kHz. Using small silicon diaphragms, these devices offered clear treble output in a form factor compact enough to fit inside TWS earbuds, augmenting the high-frequency performance of conventional balanced-armature or dynamic drivers in hybrid configurations. As a tweeter, this approach works, but challenges arise when trying to extend these devices to full-range audio.
The physics are straightforward: acoustic output is proportional to the volume of air displaced, the output of membrane area, and excursion. At the MEMS scale, both are severely constrained. A small diaphragm moving a few micrometers cannot push enough air to reproduce midrange and bass frequencies at useful sound pressure levels. As a result, the usable frequency range stays confined to the upper spectrum, output volume is capped by limited excursion, and delivering full-range audio still requires conventional drivers – meaning added bulk and complexity, which these MEMS ‘tweeters’ were meant to eliminate.
First-generation MEMS tweeters proved that silicon-based audio transducers could be manufactured at scale, but their performance is bound by membrane-displacement physics. Reaching full-range audio from a MEMS device requires a fundamentally different approach.
The shift to modulated ultrasound
First-generation MEMS tweeters generate sound the same way conventional speakers do – by moving a membrane at audible frequencies. Modulated ultrasound, however, takes an entirely different approach. Instead of moving a small membrane slowly, it moves the membrane very quickly—at ultrasonic frequencies in the hundreds of kHz range—functioning as a high-speed air pump. The audio signal is encoded as amplitude modulation of this carrier, and because the pump cycles hundreds of thousands of times per second, it displaces far more air per unit time than a membrane oscillating at audio frequencies – effectively trading membrane size for pump speed.
A critical distinction: the demodulation – the conversion from modulated ultrasound back to audible sound – occurs locally at each membrane, through the membrane’s own mechanical nonlinearity. This is not a parametric speaker effect, where ultrasonic beams interact in air to produce a narrow, directional audio beam. The output is omnidirectional, radiating sound like any conventional driver. The difference is in how the air displacement is generated, not in how the sound propagates.
This architecture unlocks a set of capabilities that are inaccessible to direct-radiating MEMS tweeters:
- Full-range audio: bass, midrange, and treble are produced from a single transducer, with no auxiliary drivers required. The modulated ultrasound device is a complete speaker, not a tweeter supplement.
- High SPL from a micro-scale source: the high-speed pumping mechanism achieves sound pressure levels that a comparably sized direct-radiating membrane cannot, enabling usable volume in open-ear and far-field applications.
- Vibration-free operation: minimal per-cycle excursion at ultrasonic frequencies eliminates the mechanical vibration inherent to direct-radiating drivers – with significant implications for component integration.
- Chip-scale integration: the transducer, ASIC, and algorithms can be co-packaged into a single module, enabling compact, multifunctional designs for next-generation wearables and ultra-thin devices.
- Power efficiency: low-voltage ultrasonic actuation keeps power consumption within the budget of battery-powered wearables and always-on AI assistants.
Modulated ultrasound removes the dependence on membrane area for air displacement, a significant constraint that made tweeters a partial solution, and replaces it with a mechanism that scales with speed rather than size.
Technical considerations when selecting a MEMS speaker
If a product designer’s goal is to augment an existing speaker – by adding high-frequency extension with a modest increase in size and power, for example – then a MEMS tweeter is a straightforward and well-proven solution. The selection criteria is conventional: frequency response, sensitivity, and mechanical compatibility with the existing driver.
When the goal is to replace an existing speaker entirely with a MEMS device, the evaluation becomes more nuanced. The critical considerations begin with the underlying transducer technology and extend to the specific demands of the target application.
Actuation technology: electrostatic vs. piezoelectric
MEMS speakers today are built on one of two actuation platforms: electrostatic or piezoelectric. The choice has significant implications for drive efficiency and system design. Electrostatic transducers present roughly 1,000 times less capacitance than their piezoelectric counterparts, which translates directly into lower drive currents and more efficient amplifier designs – a meaningful advantage in battery-powered devices where every milliwatt matters. Electrostatic platforms also build on the same proven silicon fabrication processes that underpin MEMS microphones, leveraging decades of manufacturing maturity, yield optimization, and supply chain infrastructure. Piezoelectric platforms potentially provide larger displacements for lower voltages, but depending on architecture, this does not always translate into more SPL or power-efficient operation. System engineers should carefully evaluate the impact of drive capacitance on amplifier power, thermal management, reliability, RoHS compliance, and overall efficiency before committing to a platform.
In-Ear and near-ear applications
In-ear and near-ear devices—TWS earbuds, hearing aids, open-ear wearables—operate in tightly constrained acoustic environments. The speaker is coupled to the ear through small front cavities, narrow acoustic tubes, and fine meshes that present significant acoustic loads. In this context, the ability to shape acoustic resonances is a critical selection criterion. The speaker must maintain its output and fidelity under high acoustic impedance conditions; a transducer that performs well on the bench but loses output or distorts when loaded by a tight channel and mesh is unusable in a real product.
Some modulated ultrasound architectures are inherently more robust under high acoustic loads than direct-radiating designs. Their pumping mechanism can sustain output into small front cavities and restrictive acoustic paths without the output loss that a conventional membrane experiences when backloaded. Engineers evaluating MEMS speakers for in-ear and near-ear applications should test performance under realistic acoustic loading—not just in free-field or standard coupler conditions.
Free-field applications
In free-field applications – smart glasses, smart home devices, automotive surfaces – the design challenges shift. All small speakers struggle with low-frequency output in open acoustic environments. Here, the robustness of modulated ultrasound speakers to high acoustic loads becomes an advantage for a different reason: it enables acoustic design techniques that augment low-frequency performance, boosting output precisely where small transducers fall short.
Modulated ultrasound speakers can also deliver capabilities beyond conventional audio. Their bandwidth typically extends to 100 kHz and beyond, supporting ultrasonic sensing and communication alongside audio playback. And because they function as an air pump, they can even provide active cooling – moving air across heat-generating components as a micro-fan. However, added functionality comes at the cost of power, and a careful system-level assessment is needed to ensure that these features do not drain the battery budget that the speaker shares with the rest of the device.
Beamforming and speaker arrays
The area where free-field modulated ultrasound speakers truly differentiate is beamforming. Like their MEMS microphone counterparts, MEMS speakers offer the unit-to-unit uniformity and compact form factor needed to build speaker arrays that control the spatial delivery of sound, directing audio where it is needed and maintaining quiet zones where it is not. This opens the door to applications that are impractical with conventional speakers: personal audio zones in shared spaces, targeted notifications in automotive cabins, and directional sound in smart home environments, all while minimizing noise pollution.

Deploying speaker arrays efficiently may require the adoption of emerging audio interconnect protocols such as S3IS, which are designed for scalable, low-latency distribution of audio signals across multiple transducers. The engineering consideration here is systemic: speakers, amplifiers, and controllers must be architected together, and the choice of MEMS speaker platform directly affects how efficiently this system can be deployed and scaled.
Reliability and compliance
Reliability is a baseline requirement for any component going into a consumer product, and here MEMS speakers benefit from the same structural advantages that made MEMS microphones a trusted component across the industry. Solid-state silicon transducers with no moving coils, magnets, or adhesive bonds offer inherent robustness against moisture, dust, shock, and vibration, supporting IP67-rated product designs.
One issue that deserves attention is RoHS compliance. Some MEMS speaker architectures use materials that currently fall under RoHS exemptions. While these exemptions permit use today, they have a defined expiration horizon and can affect the product’s regulatory lifetime and end-of-life planning. Engineers should verify full RoHS compliance – not just exemption-based compliance – when selecting a speaker platform for products with multi-year production roadmaps.
Component integration: adding functionality without size
Conventional speakers vibrate; that is how they produce sound. In wearables, this vibration is felt directly by the user, causing discomfort during extended wear. In larger devices, it can excite enclosure resonances, rattle adjacent components, and create unwanted acoustic artifacts. Critically, for system design, vibration makes it impossible to co-locate a speaker with vibration-sensitive components like microphones and inertial sensors without introducing mechanical crosstalk that degrades their performance.
Modulated ultrasound speakers operate without the low-frequency mechanical vibration that characterizes direct-radiating drivers. The membrane moves at ultrasonic frequencies with minimal excursion per cycle, producing no perceptible vibration at the device level. This is not just a comfort feature — it is a system architecture enabler.
Without vibration isolation constraints, speakers, microphones, and other sensors can be integrated into the same package. A single chip-scale module can combine audio output, audio input, and environmental sensing in a footprint that would otherwise accommodate only one of these functions. For product developers, this means adding capability. Such as always-on voice pickup, active noise cancellation, acoustic echo cancellation, and spatial awareness – without adding size. In devices where every cubic millimeter is contested between batteries, antennas, and processing silicon, this kind of functional density is a decisive advantage.
Furthermore, the integration benefit extends beyond the package itself. When the speaker does not vibrate the enclosure, the acoustic design of the entire device becomes simpler. There is no need for mechanical decoupling structures, vibration-damping gaskets, or physical separation between the speaker and sensitive components. The result is fewer parts, a simpler assembly process, and more design freedom for the product team.
Conclusion
The audio industry is at an inflection point. The convergence of physical AI, always-on voice interfaces, and shrinking device form factors is creating an increased demand for speakers that can deliver full-range, high-fidelity sound from components that are essentially invisible. MEMS microphones showed us that semiconductor-native audio transducers could displace legacy technology at scale, and MEMS speakers are now following the same trajectory.
For engineers and product developers, the choice of MEMS speaker architecture is not a minor component decision; it shapes what the product can do. A tweeter augments an existing audio chain, while a modulated ultrasound speaker replaces it, opening the door to capabilities that conventional drivers cannot offer, including full-range audio from a chip-scale source, vibration-free operation that enables multi-sensor integration, acoustic load robustness for demanding in-ear and free-field designs, bandwidth extending well beyond human hearing, and the unit-to-unit uniformity needed for beamforming arrays.
The question is no longer whether MEMS speakers are ready for consumer products; it is which architecture matches the ambition of the product being designed.