ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.05826
  4. Cited By
A Purely End-to-end System for Multi-speaker Speech Recognition

A Purely End-to-end System for Multi-speaker Speech Recognition

15 May 2018
Hiroshi Seki
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
J. Hershey
ArXiv (abs)PDFHTML

Papers citing "A Purely End-to-end System for Multi-speaker Speech Recognition"

50 / 56 papers shown
Title
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
SC-SOT: Conditioning the Decoder on Diarized Speaker Information for End-to-End Overlapped Speech Recognition
Yuta Hirano
Sakriani Sakti
7
0
0
15 Jun 2025
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Speaker-Distinguishable CTC: Learning Speaker Distinction Using CTC for Multi-Talker Speech Recognition
Asahi Sakuma
Hiroaki Sato
Ryuga Sugano
Tadashi Kumano
Yoshihiko Kawai
Tetsuji Ogawa
18
0
0
09 Jun 2025
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
Xinlu He
Jacob Whitehill
69
1
0
16 May 2025
Selective Masking Adversarial Attack on Automatic Speech Recognition Systems
Selective Masking Adversarial Attack on Automatic Speech Recognition Systems
Zheng Fang
Shenyi Zhang
Tao Wang
Bowen Li
Lingchen Zhao
Zhangyi Wang
AAML
59
0
0
06 Apr 2025
FedMAC: Tackling Partial-Modality Missing in Federated Learning with
  Cross-Modal Aggregation and Contrastive Regularization
FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization
Manh Duong Nguyen
Trung Thanh Nguyen
Huy Hieu Pham
Trong Nghia Hoang
Phi Le Nguyen
T. T. Huynh
66
1
0
04 Oct 2024
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Jiawen Kang
Lingwei Meng
Mingyu Cui
Yuejiao Wang
Xixin Wu
Xunying Liu
Helen Meng
115
3
0
19 Sep 2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by
  Bridging Timestamps and Tokens
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang
Nithin Rao Koluguri
Krishna Puvvada
Jagadeesh Balam
Boris Ginsburg
93
5
0
10 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for
  Multi-Speaker ASR
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
72
2
0
02 Sep 2024
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting
  Applications
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications
Can Cui
Imran Ahmad Sheikh
Mostafa Sadeghi
Emmanuel Vincent
91
3
0
11 Mar 2024
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Zhiyun Fan
Linhao Dong
Jun Zhang
Lu Lu
Zejun Ma
94
6
0
04 Mar 2024
Mixture Encoder for Joint Speech Separation and Recognition
Mixture Encoder for Joint Speech Separation and Recognition
Simon Berger
Peter Vieting
Christoph Boeddeker
Ralf Schluter
Reinhold Häb-Umbach
79
6
0
21 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
96
13
0
18 Jun 2023
End-to-End Joint Target and Non-Target Speakers ASR
End-to-End Joint Target and Non-Target Speakers ASR
Ryo Masumura
Naoki Makishima
Taiga Yamane
Yoshihiko Yamazaki
Saki Mizuno
...
Akihiko Takashima
Satoshi Suzuki
Takafumi Moriya
Nobukatsu Hojo
Atsushi Ando
60
5
0
04 Jun 2023
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Chenda Li
Yao Qian
Zhuo Chen
Naoyuki Kanda
Dongmei Wang
Takuya Yoshioka
Y. Qian
Michael Zeng
66
12
0
30 May 2023
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Yuhao Liang
Fan Yu
Yangze Li
Pengcheng Guo
Shiliang Zhang
Qian Chen
Linfu Xie
83
9
0
23 May 2023
A Sidecar Separator Can Convert a Single-Talker Speech Recognition
  System to a Multi-Talker One
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Lingwei Meng
Jiawen Kang
Mingyu Cui
Yuejiao Wang
Xixin Wu
Helen M. Meng
76
17
0
20 Feb 2023
Simulating realistic speech overlaps improves multi-talker ASR
Simulating realistic speech overlaps improves multi-talker ASR
Muqiao Yang
Naoyuki Kanda
Xiaofei Wang
Jian Wu
S. Sivasankaran
Zhuo Chen
Jinyu Li
Takuya Yoshioka
68
13
0
27 Oct 2022
VarArray Meets t-SOT: Advancing the State of the Art of Streaming
  Distant Conversational Speech Recognition
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Naoyuki Kanda
Jian Wu
Xiaofei Wang
Zhuo Chen
Jinyu Li
Takuya Yoshioka
88
18
0
12 Sep 2022
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation
  of Multi-party Speech
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
Ilya Sklyar
A. Piunova
Christian Osendorfer
60
6
0
10 May 2022
Improving the Naturalness of Simulated Conversations for End-to-End
  Neural Diarization
Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
Natsuo Yamashita
Shota Horiguchi
Takeshi Homma
74
18
0
24 Apr 2022
A Comparative Study on Speaker-attributed Automatic Speech Recognition
  in Multi-party Meetings
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
Fan Yu
Zhihao Du
Shiliang Zhang
Yuxiao Lin
Linfu Xie
42
15
0
31 Mar 2022
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
59
27
0
30 Mar 2022
Coarse-to-Fine Recursive Speech Separation for Unknown Number of
  Speakers
Coarse-to-Fine Recursive Speech Separation for Unknown Number of Speakers
Zhenhao Jin
Xiang Hao
Xiangdong Su
55
4
0
30 Mar 2022
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Xuankai Chang
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
86
6
0
01 Mar 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
139
60
0
02 Feb 2022
Multi-turn RNN-T for streaming recognition of multi-party speech
Multi-turn RNN-T for streaming recognition of multi-party speech
Ilya Sklyar
A. Piunova
Xianrui Zheng
Yulan Liu
114
24
0
19 Dec 2021
Speaker conditioning of acoustic models using affine transformation for
  multi-speaker speech recognition
Speaker conditioning of acoustic models using affine transformation for multi-speaker speech recognition
Midia Yousefi
John H.L. Hanse
28
5
0
30 Oct 2021
A Comparative Study of Modular and Joint Approaches for
  Speaker-Attributed ASR on Monaural Long-Form Audio
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio
Naoyuki Kanda
Xiong Xiao
Jian Wu
Tianyan Zhou
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
73
14
0
06 Jul 2021
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker
  Overlapped Speech Recognition and Speaker Attribute Estimation
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Ryo Masumura
Daiki Okamura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
55
7
0
04 Jul 2021
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and
  Conditional Speaker Chain
Multi-Speaker ASR Combining Non-Autoregressive Conformer CTC and Conditional Speaker Chain
Pengcheng Guo
Xuankai Chang
Shinji Watanabe
Lei Xie
48
19
0
16 Jun 2021
End-to-End Speaker-Attributed ASR with Transformer
End-to-End Speaker-Attributed ASR with Transformer
Naoyuki Kanda
Guoli Ye
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
75
49
0
05 Apr 2021
A Review of Speaker Diarization: Recent Advances with Deep Learning
A Review of Speaker Diarization: Recent Advances with Deep Learning
Tae Jin Park
Naoyuki Kanda
Dimitrios Dimitriadis
Kyu Jeong Han
Shinji Watanabe
Shrikanth Narayanan
VLM
382
337
0
24 Jan 2021
The 2020 ESPnet update: new features, broadened applications,
  performance improvements, and future plans
The 2020 ESPnet update: new features, broadened applications, performance improvements, and future plans
Shinji Watanabe
Florian Boyer
Xuankai Chang
Pengcheng Guo
Tomoki Hayashi
...
Shigeki Karita
Chenda Li
Jing Shi
Aswin Shanmugam Subramanian
Wangyou Zhang
VLM
108
38
0
23 Dec 2020
Streaming Multi-speaker ASR with RNN-T
Streaming Multi-speaker ASR with RNN-T
Ilya Sklyar
A. Piunova
Yulan Liu
80
37
0
23 Nov 2020
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant
  Environments
On End-to-end Multi-channel Time Domain Speech Separation in Reverberant Environments
Jisi Zhang
Catalin Zorila
R. Doddipatla
Jon Barker
76
46
0
11 Nov 2020
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Naoyuki Kanda
Zhong Meng
Liang Lu
Yashesh Gaur
Xiaofei Wang
Zhuo Chen
Takuya Yoshioka
71
17
0
03 Nov 2020
Investigation of End-To-End Speaker-Attributed ASR for Continuous
  Multi-Talker Recordings
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
Naoyuki Kanda
Xuankai Chang
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
74
49
0
11 Aug 2020
OtoWorld: Towards Learning to Separate by Learning to Move
OtoWorld: Towards Learning to Separate by Learning to Move
Omkar Ranadive
Grant Gasser
David Terpay
Prem Seetharaman
39
1
0
12 Jul 2020
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for
  Mixture Signals
Sequence to Multi-Sequence Learning via Conditional Chain Mapping for Mixture Signals
Jing Shi
Xuankai Chang
Pengcheng Guo
Shinji Watanabe
Yusuke Fujita
Jiaming Xu
Bo Xu
Lei Xie
96
22
0
25 Jun 2020
Joint Speaker Counting, Speech Recognition, and Speaker Identification
  for Overlapped Speech of Any Number of Speakers
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Tianyan Zhou
Takuya Yoshioka
76
78
0
19 Jun 2020
Multi-talker ASR for an unknown number of sources: Joint training of
  source counting, separation and ASR
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
Thilo von Neumann
Christoph Boeddeker
Lukas Drude
K. Kinoshita
Marc Delcroix
Tomohiro Nakatani
Reinhold Haeb-Umbach
81
41
0
04 Jun 2020
End-to-End Far-Field Speech Recognition with Unified Dereverberation and
  Beamforming
End-to-End Far-Field Speech Recognition with Unified Dereverberation and Beamforming
Wangyou Zhang
Aswin Shanmugam Subramanian
Xuankai Chang
Shinji Watanabe
Y. Qian
66
27
0
21 May 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Serialized Output Training for End-to-End Overlapped Speech Recognition
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
83
122
0
28 Mar 2020
End-to-End Multi-speaker Speech Recognition with Transformer
End-to-End Multi-speaker Speech Recognition with Transformer
Xuankai Chang
Wangyou Zhang
Y. Qian
Jonathan Le Roux
Shinji Watanabe
ViT
96
107
0
10 Feb 2020
Utterance-level Permutation Invariant Training with Latency-controlled
  BLSTM for Single-channel Multi-talker Speech Separation
Utterance-level Permutation Invariant Training with Latency-controlled BLSTM for Single-channel Multi-talker Speech Separation
Lu Huang
Gaofeng Cheng
Pengyuan Zhang
Yi Yang
Shumin Xu
Jiasong Sun
15
8
0
25 Dec 2019
End-to-end training of time domain audio separation and recognition
End-to-end training of time domain audio separation and recognition
Thilo von Neumann
K. Kinoshita
Lukas Drude
Christoph Boeddeker
Marc Delcroix
Tomohiro Nakatani
Reinhold Haeb-Umbach
76
34
0
18 Dec 2019
SMS-WSJ: Database, performance measures, and baseline recipe for
  multi-channel source separation and recognition
SMS-WSJ: Database, performance measures, and baseline recipe for multi-channel source separation and recognition
Lukas Drude
Jens Heitkaemper
Christoph Boeddeker
Reinhold Haeb-Umbach
66
72
0
30 Oct 2019
WHAMR!: Noisy and Reverberant Single-Channel Speech Separation
WHAMR!: Noisy and Reverberant Single-Channel Speech Separation
Matthew Maciejewski
Gordon Wichern
E. McQuinn
Jonathan Le Roux
89
184
0
22 Oct 2019
MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
MIMO-SPEECH: End-to-End Multi-Channel Multi-Speaker Speech Recognition
Xuankai Chang
Wangyou Zhang
Y. Qian
Jonathan Le Roux
Shinji Watanabe
95
121
0
15 Oct 2019
Simultaneous Speech Recognition and Speaker Diarization for Monaural
  Dialogue Recordings with Target-Speaker Acoustic Models
Simultaneous Speech Recognition and Speaker Diarization for Monaural Dialogue Recordings with Target-Speaker Acoustic Models
Naoyuki Kanda
Shota Horiguchi
Yusuke Fujita
Yawen Xue
Kenji Nagamatsu
Shinji Watanabe
58
36
0
17 Sep 2019
12
Next