536

Deep Ad-hoc Beamforming

Abstract

Although deep learning based speech enhancement methods have demonstrated good performance in adverse acoustic environments, their performance is strongly affected by the distance between the speech source and the microphones since speech signals fade quickly during the propagation. To address the above problem, we propose \textit{deep ad-hoc beamforming}---a deep-learning-based multichannel speech enhancement method with ad-hoc microphone arrays. It serves for scenarios where the microphones are placed randomly in a room and work collaboratively. Its core idea is to reweight the estimated speech signals with a sparsity constraint when conducting adaptive beamforming, where the weights produced by a neural network are the estimates of some predefined propagation cost, and the sparsity constraint is to filter out the microphones that are too far away from both the speech source and the majority of the ad-hoc microphone array. We conducted an extensive experiment in a scenario where the location of the speech source is far-field, random, and blind to the microphones. Results show that our method outperforms referenced deep-learning-based speech enhancement methods by a large margin.

View on arXiv
Comments on this paper