pyannote.audio: neural building blocks for speaker diarization

4 November 2019

Papers citing "pyannote.audio: neural building blocks for speaker diarization"

28 / 28 papers shown

Title
Automatic Proficiency Assessment in L2 English Learners Armita Mohammadi Alessandro Lameiras Koerich Laureano Moro-Velazquez P. Cardinal 25 0 0 05 May 2025
$Co$^{3}$Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion$ Co $^{3}$ Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion Xingqun Qi Yatian Wang Hengyuan Zhang J. Pan Wei Xue Shanghang Zhang Wenhan Luo Qifeng Liu Yike Guo SLR 53 0 0 03 May 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness Erfan Loweimi Mengjie Qian Kate Knill Mark J. F. Gales 43 0 0 26 Apr 2025
Guided Speaker Embedding Shota Horiguchi Takafumi Moriya Atsushi Ando Takanori Ashihara Hiroshi Sato Naohiro Tawara Marc Delcroix 40 0 0 03 Jan 2025
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization Samuele Cornell Taejin Park Steve Huang Christoph Boeddeker Xuankai Chang Matthew Maciejewski Matthew Wiesner Paola García Shinji Watanabe 22 9 0 23 Jul 2024
LLM-based speaker diarization correction: A generalizable approach Georgios Efstathiadis Vijay Yadav Anzar Abbas 34 3 0 07 Jun 2024
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings Théo Mariotte Anthony Larcher Silvio Montrésor Jean-Hugh Thomas 18 0 0 05 Jun 2024
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition Peng Shen Xugang Lu Hisashi Kawai 14 1 0 18 Dec 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description Tengda Han Max Bain Arsha Nagrani Gül Varol Weidi Xie Andrew Zisserman VGen DiffM 19 36 0 10 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration Piyush Singh Pasi Karthikeya Battepati P. Jyothi Ganesh Ramakrishnan T. Mahapatra Manoj Singh 22 0 0 10 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words Robin Algayres Pablo Diego-Simon Benoît Sagot Emmanuel Dupoux 10 1 0 08 Oct 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition Krishna C. Puvvada Nithin Rao Koluguri Kunal Dhawan Jagadeesh Balam Boris Ginsburg 6 12 0 19 Sep 2023
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains Martin Lebourdais Théo Mariotte Marie Tahon Anthony Larcher Antoine Laurent Silvio Montrésor S. Meignier Jean-Hugh Thomas VLM 17 5 0 24 Jul 2023
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings L. Serafini Samuele Cornell Giovanni Morrone Enrico Zovato A. Brutti S. Squartini 21 9 0 29 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person L. Gris R. Marcacini Arnaldo Cândido Júnior Edresson Casanova A. S. Soares S. Aluísio 13 7 0 23 May 2023
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge Jaesung Huh A. Brown Jee-weon Jung Joon Son Chung Arsha Nagrani D. Garcia-Romero Andrew Zisserman 11 26 0 20 Feb 2023
Anchorage: Visual Analysis of Satisfaction in Customer Service Videos via Anchor Events Kamkwai Wong Xingbo Wang Yong Wang Jianben He Rongzheng Zhang Huamin Qu 13 14 0 14 Feb 2023
Residual Information in Deep Speaker Embedding Architectures Adriana Stan 17 5 0 06 Feb 2023
Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing William Brannon Yogesh Virkar Brian Thompson 29 21 0 23 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection Rahul Sharma Shrikanth Narayanan 27 8 0 01 Dec 2022
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0 Marie Kunesova Zbynek Zajíc SSL VLM 11 15 0 26 Oct 2022
In search of strong embedding extractors for speaker diarisation Jee-weon Jung Hee-Soo Heo Bong-Jin Lee Jaesung Huh A. Brown Youngki Kwon Shinji Watanabe Joon Son Chung 27 16 0 26 Oct 2022
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach Dawei Liang Zifan Xu Yinuo Chen Rebecca Adaimi David F. Harwath Edison Thomaz 32 1 0 21 Mar 2022
Magnitude-aware Probabilistic Speaker Embeddings Nikita Kuzmin Igor Fedorov A. Sholokhov 11 7 0 28 Feb 2022
XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021 Jie Wang Fuchuan Tong Zhi-Cong Chen Lin Li Q. Hong Haodong Zhou 13 1 0 06 Sep 2021
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings Soumi Maiti Hakan Erdogan K. Wilson Scott Wisdom Shinji Watanabe J. Hershey 22 21 0 05 May 2021
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain Eugene Kharitonov M. Rivière Gabriel Synnaeve Lior Wolf Pierre-Emmanuel Mazaré Matthijs Douze Emmanuel Dupoux 10 117 0 02 Jul 2020
VoxCeleb2: Deep Speaker Recognition Joon Son Chung Arsha Nagrani Andrew Zisserman 214 1,954 0 14 Jun 2018