Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering

19 June 2025

Main:4 Pages

2 Figures

Bibliography:1 Pages

3 Tables

Abstract

We propose a spatio-spectral, combined model-based and data-driven diarization pipeline consisting of TDOA-based segmentation followed by embedding-based clustering. The proposed system requires neither access to multi-channel training data nor prior knowledge about the number or placement of microphones. It works for both a compact microphone array and distributed microphones, with minor adjustments. Due to its superior handling of overlapping speech during segmentation, the proposed pipeline significantly outperforms the single-channel pyannote approach, both in a scenario with a compact microphone array and in a setup with distributed microphones. Additionally, we show that, unlike fully spatial diarization pipelines, the proposed system can correctly track speakers when they change positions.

View on arXiv

@article{cord-landwehr2025_2506.16228,
  title={ Spatio-spectral diarization of meetings by combining TDOA-based segmentation and speaker embedding-based clustering },
  author={ Tobias Cord-Landwehr and Tobias Gburrek and Marc Deegen and Reinhold Haeb-Umbach },
  journal={arXiv preprint arXiv:2506.16228},
  year={ 2025 }
}

Comments on this paper