ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2011.02014
  4. Cited By
Integration of speech separation, diarization, and recognition for
  multi-speaker meetings: System description, comparison, and analysis

Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis

3 November 2020
Desh Raj
Pavel Denisov
Zhuo Chen
Hakan Erdogan
Zili Huang
Maokui He
Shinji Watanabe
Jun Du
Takuya Yoshioka
Yi Luo
Naoyuki Kanda
Jinyu Li
Scott Wisdom
J. Hershey
ArXiv (abs)PDFHTML

Papers citing "Integration of speech separation, diarization, and recognition for multi-speaker meetings: System description, comparison, and analysis"

50 / 60 papers shown
FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement
FlexIO: Flexible Single- and Multi-Channel Speech Separation and Enhancement
Yoshiki Masuyama
Kohei Saijo
Francesco Paissan
Jiangyu Han
Marc Delcroix
Ryo Aihara
François Germain
Gordon Wichern
Jonathan Le Roux
191
0
0
24 Oct 2025
Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
Scaling Multi-Talker ASR with Speaker-Agnostic Activity Streams
Xiluo He
Alexander Polok
Jesus Villalba
Thomas Thebaud
Matthew Maciejewski
136
2
0
04 Oct 2025
Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR
Data-independent Beamforming for End-to-end Multichannel Multi-speaker ASR
Can Cui
P. Magron
M. Sadeghi
Emmanuel Vincent
190
0
0
12 Sep 2025
Error Analysis in a Modular Meeting Transcription System
Error Analysis in a Modular Meeting Transcription System
Peter Vieting
Simon Berger
Thilo von Neumann
Christoph Boeddeker
Ralf Schluter
Reinhold Haeb-Umbach
222
0
0
12 Sep 2025
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
Unifying Diarization, Separation, and ASR with Multi-Speaker Encoder
Muhammad Shakeel
Yui Sudo
Yifan Peng
Chyi-Jiunn Lin
Shinji Watanabe
156
6
0
28 Aug 2025
Advances in Speech Separation: Techniques, Challenges, and Future Trends
Advances in Speech Separation: Techniques, Challenges, and Future Trends
Kai Li
Guo Chen
Wendi Sang
Yi Luo
Zhuo Chen
...
Shulin He
Zhong-Qiu Wang
Andong Li
Z. Wu
Xiaolin Hu
AI4TS
230
7
0
14 Aug 2025
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Yufeng Yang
H. Taherian
Vahid Ahmadi Kalkhorani
DeLiang Wang
232
0
0
23 Mar 2025
Target Speaker ASR with Whisper
Target Speaker ASR with Whisper
Alexander Polok
Dominik Klement
Sanjeev Khudanpur
Sanjeev Khudanpur
J. Černocký
L. Burget
640
18
0
17 Jan 2025
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition
DiCoW: Diarization-Conditioned Whisper for Target Speaker Automatic Speech Recognition
Alexander Polok
Dominik Klement
M. Kocour
Jiangyu Han
Federico Landini
Bolaji Yusuf
Sanjeev Khudanpur
Sanjeev Khudanpur
J. Černocký
L. Burget
311
0
0
03 Jan 2025
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR Models
MSA-ASR: Efficient Multilingual Speaker Attribution with frozen ASR ModelsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Thai-Binh Nguyen
Alexander Waibel
299
4
0
27 Nov 2024
USEF-TSE: Universal Speaker Embedding Free Target Speaker Extraction
USEF-TSE: Universal Speaker Embedding Free Target Speaker ExtractionIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2024
Bang Zeng
Ming Li
489
20
0
04 Sep 2024
Enhanced Reverberation as Supervision for Unsupervised Speech Separation
Enhanced Reverberation as Supervision for Unsupervised Speech SeparationInterspeech (Interspeech), 2024
Kohei Saijo
Gordon Wichern
François G. Germain
Zexu Pan
Jonathan Le Roux
254
2
0
06 Aug 2024
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios
  with Missing Visual Cues
RAVSS: Robust Audio-Visual Speech Separation in Multi-Speaker Scenarios with Missing Visual CuesACM Multimedia (MM), 2024
Tianrui Pan
Jie Liu
Bohan Wang
Jie Tang
Gangshan Wu
327
6
0
27 Jul 2024
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Zhiyun Fan
Linhao Dong
Jun Zhang
Lu Lu
Zejun Ma
242
13
0
04 Mar 2024
On Speaker Attribution with SURT
On Speaker Attribution with SURTThe Speaker and Language Recognition Workshop (Odyssey), 2024
Desh Raj
Sanjeev Khudanpur
Matthew Maciejewski
Leibny Paola García-Perera
Daniel Povey
Sanjeev Khudanpur
286
7
0
28 Jan 2024
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant
  Meeting Transcription
NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting TranscriptionInterspeech (Interspeech), 2024
Alon Vinnikov
Amir Ivry
Aviv Hurvitz
Igor Abramovski
S. Koubi
...
S. Sivasankaran
Yifan Gong
Min Tang
Huaming Wang
Eyal Krupka
273
51
0
16 Jan 2024
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed
  Speaker Embeddings
EEND-DEMUX: End-to-End Neural Speaker Diarization via Demultiplexed Speaker Embeddings
Sung Hwan Mun
Mingrui Han
Canyeong Moon
Nam Soo Kim
285
1
0
11 Dec 2023
Multi-channel Conversational Speaker Separation via Neural Diarization
Multi-channel Conversational Speaker Separation via Neural DiarizationIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
H. Taherian
DeLiang Wang
BDL
262
25
0
15 Nov 2023
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker
  Extraction
Typing to Listen at the Cocktail Party: Text-Guided Target Speaker ExtractionIEEE Transactions on Cognitive and Developmental Systems (IEEE TCDS), 2023
Xiang Hao
Jibin Wu
Jianwei Yu
Chenglin Xu
Kay Chen Tan
422
19
0
11 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker
  Diarization and Speech Recognition
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
291
34
0
02 Oct 2023
Meeting Recognition with Continuous Speech Separation and
  Transcription-Supported Diarization
Meeting Recognition with Continuous Speech Separation and Transcription-Supported Diarization
Thilo von Neumann
Christoph Boeddeker
Tobias Cord-Landwehr
Marc Delcroix
Reinhold Haeb-Umbach
331
14
0
28 Sep 2023
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation
  Capability
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation CapabilityIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Jian Wu
Naoyuki Kanda
Takuya Yoshioka
Rui Zhao
Zhuo Chen
Jinyu Li
209
6
0
15 Sep 2023
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting Transcription
Combining TF-GridNet and Mixture Encoder for Continuous Speech Separation for Meeting TranscriptionSpoken Language Technology Workshop (SLT), 2023
Peter Vieting
Simon Berger
Thilo von Neumann
Christoph Boeddeker
Ralf Schluter
Reinhold Haeb-Umbach
312
0
0
15 Sep 2023
Attention-based Encoder-Decoder End-to-End Neural Diarization with
  Embedding Enhancer
Attention-based Encoder-Decoder End-to-End Neural Diarization with Embedding EnhancerIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Zhengyang Chen
Bing Han
Shuai Wang
Yan-min Qian
278
30
0
13 Sep 2023
LibriWASN: A Data Set for Meeting Separation, Diarization, and
  Recognition with Asynchronous Recording Devices
LibriWASN: A Data Set for Meeting Separation, Diarization, and Recognition with Asynchronous Recording Devices
Joerg Schmalenstroeer
Tobias Gburrek
Reinhold Haeb-Umbach
183
4
0
21 Aug 2023
Exploring the Integration of Speech Separation and Recognition with
  Self-Supervised Learning Representation
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning RepresentationIEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA), 2023
Yoshiki Masuyama
Xuankai Chang
Wangyou Zhang
Samuele Cornell
Zhongqiu Wang
Nobutaka Ono
Y. Qian
Shinji Watanabe
249
8
0
23 Jul 2023
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple
  Devices in Diverse Scenarios
The CHiME-7 DASR Challenge: Distant Meeting Transcription with Multiple Devices in Diverse Scenarios
Samuele Cornell
Sanjeev Khudanpur
Shinji Watanabe
Desh Raj
Xuankai Chang
...
Matthew Maciejewski
Yoshiki Masuyama
Zhong-Qiu Wang
S. Squartini
Sanjeev Khudanpur
282
83
0
23 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech RecognitionIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
398
18
0
18 Jun 2023
A Teacher-Student approach for extracting informative speaker embeddings
  from speech mixtures
A Teacher-Student approach for extracting informative speaker embeddings from speech mixturesInterspeech (Interspeech), 2023
Tobias Cord-Landwehr
Christoph Boeddeker
Catalin Zorila
R. Doddipatla
Reinhold Haeb-Umbach
372
5
0
01 Jun 2023
On Data Sampling Strategies for Training Neural Network Speech
  Separation Models
On Data Sampling Strategies for Training Neural Network Speech Separation ModelsEuropean Signal Processing Conference (EUSIPCO), 2023
William Ravenscroft
Stefan Goetze
Thomas Hain
VLM
222
6
0
14 Apr 2023
End-to-End Integration of Speech Separation and Voice Activity Detection
  for Low-Latency Diarization of Telephone Conversations
End-to-End Integration of Speech Separation and Voice Activity Detection for Low-Latency Diarization of Telephone ConversationsSpeech Communication (Speech Commun.), 2023
Giovanni Morrone
Samuele Cornell
L. Serafini
Enrico Zovato
Alessio Brutti
S. Squartini
364
5
0
21 Mar 2023
TS-SEP: Joint Diarization and Separation Conditioned on Estimated
  Speaker Embeddings
TS-SEP: Joint Diarization and Separation Conditioned on Estimated Speaker EmbeddingsIEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Christoph Boeddeker
Aswin Shanmugam Subramanian
Gordon Wichern
Reinhold Haeb-Umbach
Jonathan Le Roux
357
34
0
07 Mar 2023
Multi-resolution location-based training for multi-channel continuous
  speech separation
Multi-resolution location-based training for multi-channel continuous speech separationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
H. Taherian
DeLiang Wang
234
7
0
16 Jan 2023
GPU-accelerated Guided Source Separation for Meeting Transcription
GPU-accelerated Guided Source Separation for Meeting TranscriptionInterspeech (Interspeech), 2022
Desh Raj
Daniel Povey
Sanjeev Khudanpur
399
47
0
10 Dec 2022
On Word Error Rate Definitions and their Efficient Computation for
  Multi-Speaker Speech Recognition Systems
On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition SystemsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Thilo von Neumann
Christoph Boeddeker
K. Kinoshita
Marc Delcroix
Reinhold Haeb-Umbach
239
23
0
29 Nov 2022
Reverberation as Supervision for Speech Separation
Reverberation as Supervision for Speech SeparationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
R. Aralikatti
Christoph Boeddeker
Gordon Wichern
Aswin Shanmugam Subramanian
Jonathan Le Roux
224
8
0
15 Nov 2022
Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of
  Experts
Handling Trade-Offs in Speech Separation with Sparsely-Gated Mixture of Experts
Xiaofei Wang
Zhuo Chen
Yu Shi
Jian Wu
Naoyuki Kanda
Takuya Yoshioka
MoE
244
2
0
11 Nov 2022
Simulating realistic speech overlaps improves multi-talker ASR
Simulating realistic speech overlaps improves multi-talker ASRIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Muqiao Yang
Naoyuki Kanda
Xiaofei Wang
Jian Wu
S. Sivasankaran
Zhuo Chen
Jinyu Li
Takuya Yoshioka
352
18
0
27 Oct 2022
CasNet: Investigating Channel Robustness for Speech Separation
CasNet: Investigating Channel Robustness for Speech Separation
Fan Wang
Yao-Fei Cheng
Hung-Shin Lee
Yu Tsao
Hsin-Min Wang
150
3
0
27 Oct 2022
Spatial-aware Speaker Diarization for Multi-channel Multi-party Meeting
Spatial-aware Speaker Diarization for Multi-channel Multi-party MeetingInterspeech (Interspeech), 2022
Jie Wang
Yuji Liu
Binling Wang
Yiming Zhi
Song Li
Shipeng Xia
Jiayang Zhang
Feng Tong
Lin Li
Q. Hong
195
11
0
24 Sep 2022
VarArray Meets t-SOT: Advancing the State of the Art of Streaming
  Distant Conversational Speech Recognition
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech RecognitionIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Naoyuki Kanda
Jian Wu
Xiaofei Wang
Zhuo Chen
Jinyu Li
Takuya Yoshioka
366
19
0
12 Sep 2022
Analysis of impact of emotions on target speech extraction and speech
  separation
Analysis of impact of emotions on target speech extraction and speech separationInternational Workshop on Acoustic Signal Enhancement (IWAENC), 2022
Jan vSvec
Katevrina vZmolíková
M. Kocour
Marc Delcroix
Tsubasa Ochiai
Ladislav Movsner
JanHonza'' vCernocký
200
6
0
15 Aug 2022
Tandem Multitask Training of Speaker Diarisation and Speech Recognition
  for Meeting Transcription
Tandem Multitask Training of Speaker Diarisation and Speech Recognition for Meeting TranscriptionInterspeech (Interspeech), 2022
Xianrui Zheng
Chuxu Zhang
P. Woodland
191
19
0
08 Jul 2022
A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network
A Meeting Transcription System for an Ad-Hoc Acoustic Sensor Network
Tobias Gburrek
Christoph Boeddeker
Thilo von Neumann
Tobias Cord-Landwehr
Joerg Schmalenstroeer
Reinhold Haeb-Umbach
153
5
0
02 May 2022
Leveraging Real Conversational Data for Multi-Channel Continuous Speech
  Separation
Leveraging Real Conversational Data for Multi-Channel Continuous Speech SeparationInterspeech (Interspeech), 2022
Xiaofei Wang
Dongmei Wang
Naoyuki Kanda
Sefik Emre Eskimez
Takuya Yoshioka
223
9
0
07 Apr 2022
An Initialization Scheme for Meeting Separation with Spatial Mixture
  Models
An Initialization Scheme for Meeting Separation with Spatial Mixture ModelsInterspeech (Interspeech), 2022
Christoph Boeddeker
Tobias Cord-Landwehr
Thilo von Neumann
Reinhold Haeb-Umbach
247
11
0
04 Apr 2022
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech
  Separation for Flexible Number of Speakers
EEND-SS: Joint End-to-End Neural Speaker Diarization and Speech Separation for Flexible Number of SpeakersSpoken Language Technology Workshop (SLT), 2022
Soumi Maiti
Yushi Ueda
Shinji Watanabe
Chunlei Zhang
Meng Yu
Shi-Xiong Zhang
Yong-mei Xu
324
45
0
31 Mar 2022
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Streaming Speaker-Attributed ASR with Token-Level Speaker EmbeddingsInterspeech (Interspeech), 2022
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
282
38
0
30 Mar 2022
Disentangling the Impacts of Language and Channel Variability on Speech
  Separation Networks
Disentangling the Impacts of Language and Channel Variability on Speech Separation NetworksInterspeech (Interspeech), 2022
Fan Wang
Hung-Shin Lee
Yu Tsao
Hsin-Min Wang
286
9
0
30 Mar 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting
  Transcription Grand Challenge
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand ChallengeIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yihui Fu
Zhihao Du
...
Kong Aik Lee
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
235
28
0
08 Feb 2022
12
Next
Page 1 of 2