Baro2Talk: Reconstructing Spectrograms from Ear Canal Pressure for Voice-free Communication

Luo Zhou, Shan Chang, Han Wang, Xianbo Wang and Hongzi Zhu

in Proceedings of IEEE INFOCOM 2026, Tokyo, Japan.

The increasing demand for private and noiseresilient speech interaction has motivated the development of Silent Speech Interfaces (SSIs) that infer user intent without vocalization. Existing SSI solutions face limitations such as intrusiveness, privacy leakage, environmental sensitivity, deployment complexity, and motion vulnerability. In this work, we present Baro2Talk, a wearable SSI system that reconstructs speech content from TMJ-dominated Pressure Variation Sequences (TPVSs) captured by miniature barometers embedded in standard earbuds. Baro2Talk is inspired by two key observations. First, silent articulation induces consistent ear canal deformation via temporomandibular joint (TMJ) movements, producing pressure fluctuations that reflect articulatory patterns associated with speech. Second, TPVSs exhibit repeatable temporal and articulatory structures within phrases, offering structured signals to support semantic modeling without acoustic input. We develop a lightweight in-ear pressure sensing prototype and propose a set of modules that first perform articulatory event detection and generalization enhancement, followed by a three-stage reconstruction pipeline: Semantic Encoding, Coarse Mel-spectrogram Construction, and Phonetic Enhancement. The resulting spectrograms are decoded into text using a pre-trained automatic speech recognition (ASR) model (e.g., Whisper). Baro2Talk achieves a 6.5% CER, 9.9% WER, and a 0.081 spectral convergence score, demonstrating robust performance in silent, mobile, and noisy environments.

PDF

Page View: 149