Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2003.12687
Cited By
Serialized Output Training for End-to-End Overlapped Speech Recognition
28 March 2020
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Serialized Output Training for End-to-End Overlapped Speech Recognition"
38 / 88 papers shown
Title
Self-supervised learning with bi-label masked speech prediction for streaming multi-talker speech recognition
Zili Huang
Zhuo Chen
Naoyuki Kanda
Jian Wu
Yiming Wang
Jinyu Li
Takuya Yoshioka
Xiaofei Wang
Peidong Wang
28
3
0
10 Nov 2022
A Comparative Study on Multichannel Speaker-Attributed Automatic Speech Recognition in Multi-party Meetings
Mohan Shi
Jie Zhang
Zhihao Du
Fan Yu
Qian Chen
Shiliang Zhang
Lirong Dai
51
4
0
01 Nov 2022
Simulating realistic speech overlaps improves multi-talker ASR
Muqiao Yang
Naoyuki Kanda
Xiaofei Wang
Jian Wu
S. Sivasankaran
Zhuo Chen
Jinyu Li
Takuya Yoshioka
23
13
0
27 Oct 2022
MFCCA:Multi-Frame Cross-Channel attention for multi-speaker ASR in Multi-party meeting scenario
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yuhao Liang
Zhihao Du
Yuxiao Lin
Linfu Xie
33
11
0
11 Oct 2022
VarArray Meets t-SOT: Advancing the State of the Art of Streaming Distant Conversational Speech Recognition
Naoyuki Kanda
Jian Wu
Xiaofei Wang
Zhuo Chen
Jinyu Li
Takuya Yoshioka
29
16
0
12 Sep 2022
Separator-Transducer-Segmenter: Streaming Recognition and Segmentation of Multi-party Speech
Ilya Sklyar
A. Piunova
Christian Osendorfer
11
6
0
10 May 2022
Improving the Naturalness of Simulated Conversations for End-to-End Neural Diarization
Natsuo Yamashita
Shota Horiguchi
Takeshi Homma
26
16
0
24 Apr 2022
A Comparative Study on Speaker-attributed Automatic Speech Recognition in Multi-party Meetings
Fan Yu
Zhihao Du
Shiliang Zhang
Yuxiao Lin
Linfu Xie
22
13
0
31 Mar 2022
Streaming Speaker-Attributed ASR with Token-Level Speaker Embeddings
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
24
26
0
30 Mar 2022
Extended Graph Temporal Classification for Multi-Speaker End-to-End ASR
Xuankai Chang
Niko Moritz
Takaaki Hori
Shinji Watanabe
Jonathan Le Roux
24
6
0
01 Mar 2022
The Volcspeech system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge
Chen Shen
Yi Y. Liu
Wenzhi Fan
Bin Wang
Shi-Xue Wen
Yao Tian
Jun Zhang
Jingsheng Yang
Zejun Ma
14
4
0
09 Feb 2022
Summary On The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Grand Challenge
Fan Yu
Shiliang Zhang
Pengcheng Guo
Yihui Fu
Zhihao Du
...
Kong Aik Lee
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
18
28
0
08 Feb 2022
The RoyalFlush System of Speech Recognition for M2MeT Challenge
Shuaishuai Ye
Peiyao Wang
Shunfei Chen
Xinhui Hu
Xinkang Xu
24
5
0
03 Feb 2022
Joint Speech Recognition and Audio Captioning
Chaitanya Narisetty
E. Tsunoo
Xuankai Chang
Yosuke Kashiwagi
Michael Hentschel
Shinji Watanabe
27
10
0
03 Feb 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
36
54
0
02 Feb 2022
Endpoint Detection for Streaming End-to-End Multi-talker ASR
Liang Lu
Jinyu Li
Yifan Gong
17
17
0
24 Jan 2022
Multi-turn RNN-T for streaming recognition of multi-party speech
Ilya Sklyar
A. Piunova
Xianrui Zheng
Yulan Liu
24
22
0
19 Dec 2021
Speaker Embedding-aware Neural Diarization for Flexible Number of Speakers with Textual Information
Zhihao Du
Shiliang Zhang
Siqi Zheng
Weilong Huang
Ming Lei
BDL
24
1
0
28 Nov 2021
Recent Advances in End-to-End Automatic Speech Recognition
Jinyu Li
VLM
37
363
0
02 Nov 2021
M2MeT: The ICASSP 2022 Multi-Channel Multi-Party Meeting Transcription Challenge
Fan Yu
Shiliang Zhang
Yihui Fu
Lei Xie
Siqi Zheng
...
Pengcheng Guo
Zhijie Yan
B. Ma
Xin Xu
Hui Bu
11
106
0
14 Oct 2021
Transcribe-to-Diarize: Neural Speaker Diarization for Unlimited Number of Speakers using End-to-End Speaker-Attributed ASR
Naoyuki Kanda
Xiong Xiao
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
29
36
0
07 Oct 2021
Continuous Streaming Multi-Talker ASR with Dual-path Transducers
Desh Raj
Liang Lu
Zhuo Chen
Yashesh Gaur
Jinyu Li
21
17
0
17 Sep 2021
A Comparative Study of Modular and Joint Approaches for Speaker-Attributed ASR on Monaural Long-Form Audio
Naoyuki Kanda
Xiong Xiao
Jian Wu
Tianyan Zhou
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
19
14
0
06 Jul 2021
Unified Autoregressive Modeling for Joint End-to-End Multi-Talker Overlapped Speech Recognition and Speaker Attribute Estimation
Ryo Masumura
Daiki Okamura
Naoki Makishima
Mana Ihori
Akihiko Takashima
Tomohiro Tanaka
Shota Orihashi
38
7
0
04 Jul 2021
Many-Speakers Single Channel Speech Separation with Optimal Permutation Training
Shaked Dovrat
Eliya Nachmani
Lior Wolf
VLM
14
21
0
18 Apr 2021
End-to-End Speaker-Attributed ASR with Transformer
Naoyuki Kanda
Guoli Ye
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
27
47
0
05 Apr 2021
Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
27
19
0
05 Apr 2021
Large-Scale Pre-Training of End-to-End Multi-Talker ASR for Meeting Transcription with Single Distant Microphone
Naoyuki Kanda
Guoli Ye
Yu-Huan Wu
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
39
41
0
31 Mar 2021
A Review of Speaker Diarization: Recent Advances with Deep Learning
Tae Jin Park
Naoyuki Kanda
Dimitrios Dimitriadis
Kyu Jeong Han
Shinji Watanabe
Shrikanth Narayanan
VLM
274
328
0
24 Jan 2021
Hypothesis Stitcher for End-to-End Speaker-attributed ASR on Long-form Multi-talker Recordings
Xuankai Chang
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
RALM
15
15
0
06 Jan 2021
Streaming end-to-end multi-talker speech recognition
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
13
41
0
26 Nov 2020
Streaming Multi-speaker ASR with RNN-T
Ilya Sklyar
A. Piunova
Yulan Liu
25
36
0
23 Nov 2020
Rethinking the Separation Layers in Speech Separation Networks
Yi Luo
Zhuo Chen
Cong Han
Chenda Li
Tianyan Zhou
N. Mesgarani
19
10
0
17 Nov 2020
Minimum Bayes Risk Training for End-to-End Speaker-Attributed ASR
Naoyuki Kanda
Zhong Meng
Liang Lu
Yashesh Gaur
Xiaofei Wang
Zhuo Chen
Takuya Yoshioka
27
17
0
03 Nov 2020
An End-to-end Architecture of Online Multi-channel Speech Separation
Jian Wu
Zhuo Chen
Jinyu Li
Takuya Yoshioka
Zhili Tan
Ed Lin
Yi Luo
Lei Xie
3DV
19
21
0
07 Sep 2020
Investigation of End-To-End Speaker-Attributed ASR for Continuous Multi-Talker Recordings
Naoyuki Kanda
Xuankai Chang
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
22
48
0
11 Aug 2020
Joint Speaker Counting, Speech Recognition, and Speaker Identification for Overlapped Speech of Any Number of Speakers
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Tianyan Zhou
Takuya Yoshioka
19
74
0
19 Jun 2020
Multi-talker ASR for an unknown number of sources: Joint training of source counting, separation and ASR
Thilo von Neumann
Christoph Boeddeker
Lukas Drude
K. Kinoshita
Marc Delcroix
Tomohiro Nakatani
Reinhold Haeb-Umbach
13
41
0
04 Jun 2020
Previous
1
2