Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.01255
Cited By
pyannote.audio: neural building blocks for speaker diarization
4 November 2019
H. Bredin
Ruiqing Yin
Juan Manuel Coria
G. Gelly
Pavel Korshunov
Marvin Lavechin
D. Fustes
Hadrien Titeux
Wassim Bouaziz
Marie-Philippe Gill
Re-assign community
ArXiv
PDF
HTML
Papers citing
"pyannote.audio: neural building blocks for speaker diarization"
28 / 28 papers shown
Title
Automatic Proficiency Assessment in L2 English Learners
Armita Mohammadi
Alessandro Lameiras Koerich
Laureano Moro-Velazquez
P. Cardinal
25
0
0
05 May 2025
Co
3
^{3}
3
Gesture: Towards Coherent Concurrent Co-speech 3D Gesture Generation with Interactive Diffusion
Xingqun Qi
Yatian Wang
Hengyuan Zhang
J. Pan
Wei Xue
Shanghang Zhang
Wenhan Luo
Qifeng Liu
Yike Guo
SLR
53
0
0
03 May 2025
Speaker Retrieval in the Wild: Challenges, Effectiveness and Robustness
Erfan Loweimi
Mengjie Qian
Kate Knill
Mark J. F. Gales
43
0
0
26 Apr 2025
Guided Speaker Embedding
Shota Horiguchi
Takafumi Moriya
Atsushi Ando
Takanori Ashihara
Hiroshi Sato
Naohiro Tawara
Marc Delcroix
40
0
0
03 Jan 2025
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Matthew Wiesner
Paola García
Shinji Watanabe
22
9
0
23 Jul 2024
LLM-based speaker diarization correction: A generalizable approach
Georgios Efstathiadis
Vijay Yadav
Anzar Abbas
34
3
0
07 Jun 2024
ASoBO: Attentive Beamformer Selection for Distant Speaker Diarization in Meetings
Théo Mariotte
Anthony Larcher
Silvio Montrésor
Jean-Hugh Thomas
18
0
0
05 Jun 2024
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
14
1
0
18 Dec 2023
AutoAD II: The Sequel -- Who, When, and What in Movie Audio Description
Tengda Han
Max Bain
Arsha Nagrani
Gül Varol
Weidi Xie
Andrew Zisserman
VGen
DiffM
19
36
0
10 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration
Piyush Singh Pasi
Karthikeya Battepati
P. Jyothi
Ganesh Ramakrishnan
T. Mahapatra
Manoj Singh
22
0
0
10 Oct 2023
XLS-R fine-tuning on noisy word boundaries for unsupervised speech segmentation into words
Robin Algayres
Pablo Diego-Simon
Benoît Sagot
Emmanuel Dupoux
10
1
0
08 Oct 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
6
12
0
19 Sep 2023
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains
Martin Lebourdais
Théo Mariotte
Marie Tahon
Anthony Larcher
Antoine Laurent
Silvio Montrésor
S. Meignier
Jean-Hugh Thomas
VLM
17
5
0
24 Jul 2023
An Experimental Review of Speaker Diarization methods with application to Two-Speaker Conversational Telephone Speech recordings
L. Serafini
Samuele Cornell
Giovanni Morrone
Enrico Zovato
A. Brutti
S. Squartini
21
9
0
29 May 2023
Evaluating OpenAI's Whisper ASR for Punctuation Prediction and Topic Modeling of life histories of the Museum of the Person
L. Gris
R. Marcacini
Arnaldo Cândido Júnior
Edresson Casanova
A. S. Soares
S. Aluísio
13
7
0
23 May 2023
VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge
Jaesung Huh
A. Brown
Jee-weon Jung
Joon Son Chung
Arsha Nagrani
D. Garcia-Romero
Andrew Zisserman
11
26
0
20 Feb 2023
Anchorage: Visual Analysis of Satisfaction in Customer Service Videos via Anchor Events
Kamkwai Wong
Xingbo Wang
Yong Wang
Jianben He
Rongzheng Zhang
Huamin Qu
13
14
0
14 Feb 2023
Residual Information in Deep Speaker Embedding Architectures
Adriana Stan
17
5
0
06 Feb 2023
Dubbing in Practice: A Large Scale Study of Human Localization With Insights for Automatic Dubbing
William Brannon
Yogesh Virkar
Brian Thompson
29
21
0
23 Dec 2022
Audio-Visual Activity Guided Cross-Modal Identity Association for Active Speaker Detection
Rahul Sharma
Shrikanth Narayanan
27
8
0
01 Dec 2022
Multitask Detection of Speaker Changes, Overlapping Speech and Voice Activity Using wav2vec 2.0
Marie Kunesova
Zbynek Zajíc
SSL
VLM
11
15
0
26 Oct 2022
In search of strong embedding extractors for speaker diarisation
Jee-weon Jung
Hee-Soo Heo
Bong-Jin Lee
Jaesung Huh
A. Brown
Youngki Kwon
Shinji Watanabe
Joon Son Chung
27
16
0
26 Oct 2022
Automated detection of foreground speech with wearable sensing in everyday home environments: A transfer learning approach
Dawei Liang
Zifan Xu
Yinuo Chen
Rebecca Adaimi
David F. Harwath
Edison Thomaz
32
1
0
21 Mar 2022
Magnitude-aware Probabilistic Speaker Embeddings
Nikita Kuzmin
Igor Fedorov
A. Sholokhov
11
7
0
28 Feb 2022
XMUSPEECH System for VoxCeleb Speaker Recognition Challenge 2021
Jie Wang
Fuchuan Tong
Zhi-Cong Chen
Lin Li
Q. Hong
Haodong Zhou
13
1
0
06 Sep 2021
End-to-End Diarization for Variable Number of Speakers with Local-Global Networks and Discriminative Speaker Embeddings
Soumi Maiti
Hakan Erdogan
K. Wilson
Scott Wisdom
Shinji Watanabe
J. Hershey
22
21
0
05 May 2021
Data Augmenting Contrastive Learning of Speech Representations in the Time Domain
Eugene Kharitonov
M. Rivière
Gabriel Synnaeve
Lior Wolf
Pierre-Emmanuel Mazaré
Matthijs Douze
Emmanuel Dupoux
10
117
0
02 Jul 2020
VoxCeleb2: Deep Speaker Recognition
Joon Son Chung
Arsha Nagrani
Andrew Zisserman
214
1,954
0
14 Jun 2018
1