Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2004.09249
Cited By
v1
v2 (latest)
CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings
20 April 2020
Shinji Watanabe
Michael I. Mandel
Jon Barker
Emmanuel Vincent
Ashish Arora
Xuankai Chang
Sanjeev Khudanpur
Vimal Manohar
Daniel Povey
Desh Raj
David Snyder
Aswin Shanmugam Subramanian
Jan "Yenda" Trmal
Bar Ben Yair
Christoph Boeddeker
Zhaoheng Ni
Emmanuel Vincent
Shota Horiguchi
Naoyuki Kanda
Takuya Yoshioka
Neville Ryant
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CHiME-6 Challenge:Tackling Multispeaker Speech Recognition for Unsegmented Recordings"
50 / 195 papers shown
LibriConvo: Simulating Conversations from Read Literature for ASR and Diarization
Máté Gedeon
Péter Mihajlik
77
1
0
27 Oct 2025
A Cocktail-Party Benchmark: Multi-Modal dataset and Comparative Evaluation Results
Thai-Binh Nguyen
Katerina Zmolikova
Pingchuan Ma
Ngoc-Quan Pham
Christian Fuegen
A. Waibel
115
1
0
27 Oct 2025
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
Yejin Kwon
Taewoo Kang
Hyunsoo Yoon
Changouk Kim
AuLLM
ELM
LRM
243
0
0
22 Oct 2025
Hallucination Benchmark for Speech Foundation Models
Alkis Koudounas
Moreno La Quatra
Manuel Giollo
Sabato Marco Siniscalchi
Elena Baralis
HILM
319
1
0
18 Oct 2025
Automatic Speech Recognition in the Modern Era: Architectures, Training, and Evaluation
Md. Nayeem
Md Shamse Tabrej
Kabbojit Jit Deb
Shaonti Goswami
Md. Azizul Hakim
AI4TS
VLM
161
3
0
11 Oct 2025
Target speaker anonymization in multi-speaker recordings
N. Tomashenko
Junichi Yamagishi
Xin Eric Wang
Yun Liu
Emmanuel Vincent
117
1
0
10 Oct 2025
LOTUSDIS: A Thai far-field meeting corpus for robust conversational ASR
Pattara Tipaksorn
Sumonmas Thatphithakkul
Vataya Chunwijitra
Kwanchiva Thangthai
91
0
0
23 Sep 2025
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
Sidharth Surapaneni
Hoang Nguyen
Jash Mehta
Aman Tiwari
Oluwanifemi Bamgbose
Akshay Kalkunte
Sai Rajeswar
Sathwik Tejaswi Madhusudhan
AuLLM
ELM
218
1
0
09 Sep 2025
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
Muhammad Shakeel
Yui Sudo
Yifan Peng
Chyi-Jiunn Lin
Shinji Watanabe
149
4
0
28 Aug 2025
A dataset and model for auditory scene recognition for hearing devices: AHEAD-DS and OpenYAMNet
Henry Zhong
Jörg M. Buchholz
Julian Maclaren
Simon Carlile
Richard F. Lyon
245
0
0
14 Aug 2025
MSU-Bench: Towards Understanding the Conversational Multi-talker Scenarios
Shuai Wang
Zhaokai Sun
Zhennan Lin
C. Wang
Zhou Pan
Lei Xie
AuLLM
227
6
0
11 Aug 2025
SPGISpeech 2.0: Transcribed multi-speaker financial audio for speaker-tagged transcription
Raymond Grossman
Taejin Park
Kunal Dhawan
Andrew Titus
Sophia Zhi
Yulia Shchadilova
Weiqing Wang
Jagadeesh Balam
Boris Ginsburg
86
1
0
07 Aug 2025
Can We Really Repurpose Multi-Speaker ASR Corpus for Speaker Diarization?
Shota Horiguchi
Naohiro Tawara
Takanori Ashihara
Atsushi Ando
Marc Delcroix
199
1
0
12 Jul 2025
The Impact of Automatic Speech Transcription on Speaker Attribution
Cristina Aggazzotti
Matthew Wiesner
Elizabeth Allyn Smith
Nicholas Andrews
290
1
0
11 Jul 2025
Omni-Router: Sharing Routing Decisions in Sparse Mixture-of-Experts for Speech Recognition
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
MoE
299
3
0
08 Jul 2025
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
Weiqing Wang
T. Park
Ivan Medennikov
Jinhan Wang
Kunal Dhawan
He Huang
Nithin Rao Koluguri
Jagadeesh Balam
Boris Ginsburg
243
4
0
27 Jun 2025
Exploring Speaker Diarization with Mixture of Experts
Gaobin Yang
Maokui He
Shutong Niu
Ruoyu Wang
Hang Chen
Jun Du
MoE
206
0
0
17 Jun 2025
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Asahi Sakuma
Hiroaki Sato
Ryuga Sugano
Tadashi Kumano
Yoshihiko Kawai
Tetsuji Ogawa
153
4
0
09 Jun 2025
Pseudo Labels-based Neural Speech Enhancement for the AVSR Task in the MISP-Meeting Challenge
Longjie Luo
Shenghui Lu
Lin Li
Q. Hong
VLM
183
0
0
30 May 2025
SuPseudo: A Pseudo-supervised Learning Method for Neural Speech Enhancement in Far-field Speech Recognition
Longjie Luo
Lin Li
Q. Hong
225
0
0
30 May 2025
The Multimodal Information Based Speech Processing (MISP) 2025 Challenge: Audio-Visual Diarization and Recognition
Ming Gao
Shilong Wu
Hang Chen
Jun Du
Chin-Hui Lee
Shinji Watanabe
Jingdong Chen
Siniscalchi Sabato Marco
O. Scharenborg
355
6
0
20 May 2025
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
Xinlu He
Jacob Whitehill
320
4
0
16 May 2025
BERSting at the Screams: A Benchmark for Distanced, Emotional and Shouted Speech Recognition
Computer Speech and Language (CSL), 2025
Paige Tuttosi
Mantaj Dhillon
Luna Sang
Shane Eastwood
Poorvi Bhatia
Quang Minh Dinh
Avni Kapoor
Yewon Jin
Angelica Lim
395
3
0
30 Apr 2025
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Yufeng Yang
H. Taherian
Vahid Ahmadi Kalkhorani
DeLiang Wang
228
0
0
23 Mar 2025
Adopting Whisper for Confidence Estimation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Vaibhav Aggarwal
Shabari S Nair
Yash Verma
Yash Jogi
304
2
0
20 Feb 2025
Playing with Voices: Tabletop Role-Playing Game Recordings as a Diarization Challenge
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Lian Remme
Kevin Tang
317
0
0
18 Feb 2025
On the Robust Approximation of ASR Metrics
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Abdul Waheed
Hanin Atwany
Rita Singh
Bhiksha Raj
345
3
0
18 Feb 2025
SCDiar: a streaming diarization system based on speaker change detection and speech recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Naijun Zheng
Xucheng Wan
Kai Liu
Zhou Huan
189
0
0
28 Jan 2025
Summary of the NOTSOFAR-1 Challenge: Highlights and Learnings
Computer Speech and Language (CSL), 2025
Igor Abramovski
Alon Vinnikov
Shalev Shaer
Naoyuki Kanda
Xiaofei Wang
Amir Ivry
Eyal Krupka
360
5
0
28 Jan 2025
SEAL: Speaker Error Correction using Acoustic-conditioned Large Language Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2025
Anurag Kumar
Rohit Paturi
Amber Afshan
S. Srinivasan
295
2
0
14 Jan 2025
Microphone Array Signal Processing and Deep Learning for Speech Enhancement
IEEE Signal Processing Magazine (IEEE Signal Process. Mag.), 2024
Reinhold Haeb-Umbach
Tomohiro Nakatani
Marc Delcroix
Christoph Boeddeker
Tsubasa Ochiai
277
6
0
13 Jan 2025
Guided Speaker Embedding
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Shota Horiguchi
Takafumi Moriya
Atsushi Ando
Takanori Ashihara
Hiroshi Sato
Naohiro Tawara
Marc Delcroix
349
4
0
03 Jan 2025
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition
Alexander Polok
Dominik Klement
M. Kocour
Jiangyu Han
Federico Landini
Bolaji Yusuf
Sanjeev Khudanpur
Sanjeev Khudanpur
J. Černocký
L. Burget
305
0
0
03 Jan 2025
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Thai-Binh Nguyen
Alexander Waibel
291
3
0
27 Nov 2024
Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription
Can Cui
Imran A. Sheikh
Mostafa Sadeghi
Emmanuel Vincent
377
1
0
29 Oct 2024
STCON System for the CHiME-8 Challenge
Anton Mitrofanov
Tatiana Prisyach
Tatiana Timofeeva
Sergei Novoselov
M. Korenevsky
...
Dmitriy Miroshnichenko
Nikita Mamaev
Ilya Odegov
Olga Rudnitskaya
A. Romanenko
253
6
0
17 Oct 2024
SonicSim: A customizable simulation platform for speech processing in moving sound source scenarios
International Conference on Learning Representations (ICLR), 2024
Kai Li
Wendi Sang
Chang Zeng
Runxuan Yang
Guo Chen
Xiaolin Hu
333
8
0
02 Oct 2024
Alignment-Free Training for Transducer-based Multi-Talker ASR
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Takafumi Moriya
Shota Horiguchi
Marc Delcroix
Ryo Masumura
Takanori Ashihara
Hiroshi Sato
Kohei Matsuura
Masato Mimura
259
9
0
30 Sep 2024
Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ruoyu Wang
Shutong Niu
Gaobin Yang
Jun Du
Shuangqing Qian
Tian Gao
Jia Pan
330
5
0
25 Sep 2024
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
Jinhan Wang
Weiqing Wang
Kunal Dhawan
Taejin Park
Myungjong Kim
Ivan Medennikov
He Huang
Nithin Koluguri
Jagadeesh Balam
Boris Ginsburg
345
5
0
18 Sep 2024
Large Language Model Based Generative Error Correction: A Challenge and Baselines for Speech Recognition, Speaker Tagging, and Emotion Recognition
Spoken Language Technology Workshop (SLT), 2024
Chao-Han Huck Yang
Taejin Park
Yuan Gong
Yuanchao Li
Zhehuai Chen
...
Eng Siong Chng
Peter Bell
Catherine Lai
Shinji Watanabe
A. Stolcke
AuLLM
ELM
356
14
0
15 Sep 2024
Sortformer: A Novel Approach for Permutation-Resolved Speaker Supervision in Speech-to-Text Systems
Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang
Nithin Rao Koluguri
Krishna Puvvada
Jagadeesh Balam
Boris Ginsburg
349
6
0
10 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Spoken Language Technology Workshop (SLT), 2024
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
261
5
0
02 Sep 2024
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Interspeech (Interspeech), 2024
Zengrui Jin
Yifan Yang
Mohan Shi
Wei Kang
Xiaoyu Yang
...
Lingwei Meng
Long Lin
Yong Xu
Shi-Xiong Zhang
Daniel Povey
223
7
0
01 Sep 2024
Advancing Multi-talker ASR Performance with Large Language Models
Spoken Language Technology Workshop (SLT), 2024
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
240
12
0
30 Aug 2024
Integrating Audio, Visual, and Semantic Information for Enhanced Multimodal Speaker Diarization
Luyao Cheng
Hui Wang
Siqi Zheng
Yafeng Chen
Rongjie Huang
Qinglin Zhang
Qian Chen
Xihao Li
258
5
0
22 Aug 2024
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Samuele Cornell
Jordan Darefsky
Zhiyao Duan
Shinji Watanabe
SyDa
288
8
0
17 Aug 2024
ctPuLSE: Close-Talk, and Pseudo-Label Based Far-Field, Speech Enhancement
Journal of the Acoustical Society of America (JASA), 2024
Zhong-Qiu Wang
270
2
0
28 Jul 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Sanjeev Khudanpur
Paola García
Shinji Watanabe
234
28
0
23 Jul 2024
Self-Train Before You Transcribe
Robert Flynn
Anton Ragni
297
0
0
17 Jun 2024
1
2
3
4
Next
Page 1 of 4