SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures

15 April 2025

Abstract

Imagine placing your smartphone on a table in a noisy restaurant and clearly capturing the voices of friends seated around you, or recording a lecturer's voice with clarity in a reverberant auditorium. We introduce SonicSieve, the first intelligent directional speech extraction system for smartphones using a bio-inspired acoustic microstructure. Our passive design embeds directional cues onto incoming speech without any additional electronics. It attaches to the in-line mic of low-cost wired earphones which can be attached to smartphones. We present an end-to-end neural network that processes the raw audio mixtures in real-time on mobile devices. Our results show that SonicSieve achieves a signal quality improvement of 5.0 dB when focusing on a 30° angular region. Additionally, the performance of our system based on only two microphones exceeds that of conventional 5-microphone arrays.

View on arXiv

@article{yuan2025_2504.10793,
  title={ SonicSieve: Bringing Directional Speech Extraction to Smartphones Using Acoustic Microstructures },
  author={ Kuang Yuan and Yifeng Wang and Xiyuxing Zhang and Chengyi Shen and Swarun Kumar and Justin Chan },
  journal={arXiv preprint arXiv:2504.10793},
  year={ 2025 }
}

Comments on this paper