278

Online Localization and Tracking of Multiple Moving Speakers in Reverberant Environments

IEEE Journal on Selected Topics in Signal Processing (JSTSP), 2018
Abstract

This paper addresses the problem of online multiple-speaker localization and tracking in reverberant environments. We propose to use the direct-path relative transfer function (DP-RTF) -- a feature that encodes the inter-channel direct-path information robust against reverberation, hence well suited for reliable localization. A complex Gaussian mixture model (CGMM) is then used, such that each component weight represents the probability that an active speaker is present at a corresponding candidate source direction. Exponentiated gradient descent is used to update these weights online by minimizing a combination of negative log-likelihood and entropy. The latter imposes sparsity over the number of audio sources, since in practice only a few speakers are simultaneously active. The outputs of this online localization process are then used as observations within a Bayesian filtering process whose computation is made tractable via an instance of variational expectation-maximization. Birth and sleeping processes are used to account for the intermittent nature of speech. The method is thoroughly evaluated using several datasets.

View on arXiv
Comments on this paper