ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2003.12687
  4. Cited By
Serialized Output Training for End-to-End Overlapped Speech Recognition

Serialized Output Training for End-to-End Overlapped Speech Recognition

28 March 2020
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
ArXivPDFHTML

Papers citing "Serialized Output Training for End-to-End Overlapped Speech Recognition"

50 / 88 papers shown
Title
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
Xinlu He
Jacob Whitehill
19
0
0
16 May 2025
Listen to Extract: Onset-Prompted Target Speaker Extraction
Listen to Extract: Onset-Prompted Target Speaker Extraction
Pengjie Shen
Kangrui Chen
Shulin He
Pengru Chen
Shuqi Yuan
He Kong
Xueliang Zhang
Zehao Wang
53
0
0
08 May 2025
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Elevating Robust Multi-Talker ASR by Decoupling Speaker Separation and Speech Recognition
Yufeng Yang
H. Taherian
Vahid Ahmadi Kalkhorani
DeLiang Wang
44
0
0
23 Mar 2025
Target Speaker ASR with Whisper
Target Speaker ASR with Whisper
Alexander Polok
Dominik Klement
Sanjeev Khudanpur
Kevin Duh
J. Černocký
L. Burget
107
2
0
17 Jan 2025
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
SQ-Whisper: Speaker-Querying based Whisper Model for Target-Speaker ASR
Pengcheng Guo
Xuankai Chang
Hang Lv
Shinji Watanabe
Lei Xie
66
1
0
07 Dec 2024
Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone
  Meeting Transcription
Joint Beamforming and Speaker-Attributed ASR for Real Distant-Microphone Meeting Transcription
Can Cui
Imran A. Sheikh
Mostafa Sadeghi
Emmanuel Vincent
39
0
0
29 Oct 2024
FedMAC: Tackling Partial-Modality Missing in Federated Learning with
  Cross-Modal Aggregation and Contrastive Regularization
FedMAC: Tackling Partial-Modality Missing in Federated Learning with Cross-Modal Aggregation and Contrastive Regularization
Manh Duong Nguyen
Trung Thanh Nguyen
Huy Hieu Pham
Trong Nghia Hoang
Phi Le Nguyen
T. T. Huynh
31
1
0
04 Oct 2024
Alignment-Free Training for Transducer-based Multi-Talker ASR
Alignment-Free Training for Transducer-based Multi-Talker ASR
Takafumi Moriya
Shota Horiguchi
Marc Delcroix
Ryo Masumura
Takanori Ashihara
Hiroshi Sato
Kohei Matsuura
Masato Mimura
39
2
0
30 Sep 2024
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Jiawen Kang
Lingwei Meng
Mingyu Cui
Yuejiao Wang
Xixin Wu
Xunying Liu
Helen Meng
44
2
0
19 Sep 2024
META-CAT: Speaker-Informed Speech Embeddings via Meta Information
  Concatenation for Multi-talker ASR
META-CAT: Speaker-Informed Speech Embeddings via Meta Information Concatenation for Multi-talker ASR
Jinhan Wang
Weiqing Wang
Kunal Dhawan
Taejin Park
Myungjong Kim
Ivan Medennikov
He Huang
Nithin Koluguri
Jagadeesh Balam
Boris Ginsburg
54
1
0
18 Sep 2024
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
M-BEST-RQ: A Multi-Channel Speech Foundation Model for Smart Glasses
Yufeng Yang
Desh Raj
Ju Lin
Niko Moritz
Junteng Jia
...
Egor Lakomkin
Yiteng Huang
Jacob Donley
Jay Mahadeokar
Ozlem Kalinli
39
2
0
17 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
75
2
0
13 Sep 2024
Sortformer: Seamless Integration of Speaker Diarization and ASR by
  Bridging Timestamps and Tokens
Sortformer: Seamless Integration of Speaker Diarization and ASR by Bridging Timestamps and Tokens
Taejin Park
Ivan Medennikov
Kunal Dhawan
Weiqing Wang
He Huang
Nithin Rao Koluguri
Krishna Puvvada
Jagadeesh Balam
Boris Ginsburg
42
3
0
10 Sep 2024
Resource-Efficient Adaptation of Speech Foundation Models for
  Multi-Speaker ASR
Resource-Efficient Adaptation of Speech Foundation Models for Multi-Speaker ASR
Weiqing Wang
Kunal Dhawan
Taejin Park
Krishna Puvvada
Ivan Medennikov
Somshubra Majumdar
He Huang
Jagadeesh Balam
Boris Ginsburg
44
2
0
02 Sep 2024
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant
  Multi-Talker Speech Separation, ASR and Speaker Diarization
LibriheavyMix: A 20,000-Hour Dataset for Single-Channel Reverberant Multi-Talker Speech Separation, ASR and Speaker Diarization
Zengrui Jin
Yifan Yang
Mohan Shi
Wei Kang
Xiaoyu Yang
...
Lingwei Meng
Long Lin
Yong Xu
Shi-Xiong Zhang
Daniel Povey
28
2
0
01 Sep 2024
Serialized Speech Information Guidance with Overlapped Encoding
  Separation for Multi-Speaker Automatic Speech Recognition
Serialized Speech Information Guidance with Overlapped Encoding Separation for Multi-Speaker Automatic Speech Recognition
Hao Shi
Yuan Gao
Zhaoheng Ni
Tatsuya Kawahara
34
2
0
01 Sep 2024
Advancing Multi-talker ASR Performance with Large Language Models
Advancing Multi-talker ASR Performance with Large Language Models
Mohan Shi
Zengrui Jin
Yaoxun Xu
Yong Xu
Shi-Xiong Zhang
Kun Wei
Yiwen Shao
Chunlei Zhang
Dong Yu
31
1
0
30 Aug 2024
Generating Data with Text-to-Speech and Large-Language Models for
  Conversational Speech Recognition
Generating Data with Text-to-Speech and Large-Language Models for Conversational Speech Recognition
Samuele Cornell
Jordan Darefsky
Zhiyao Duan
Shinji Watanabe
SyDa
68
4
0
17 Aug 2024
Serialized Output Training by Learned Dominance
Serialized Output Training by Learned Dominance
Ying Shi
Lantian Li
Shi Yin
D. Wang
Jiqing Han
23
4
0
04 Jul 2024
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and
  Reverberant Multi-Speaker Automatic Speech Recognition
Transcription-Free Fine-Tuning of Speech Separation Models for Noisy and Reverberant Multi-Speaker Automatic Speech Recognition
William Ravenscroft
George Close
Stefan Goetze
Thomas Hain
Mohammad Soleymanpour
Anurag Chowdhury
Mark C. Fuhs
34
0
0
13 Jun 2024
Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio
  Source Separation
Jointly Recognizing Speech and Singing Voices Based on Multi-Task Audio Source Separation
Ye Bai
Chenxing Li
Hao Li
Yuanyuan Zhao
Xiaorui Wang
26
0
0
17 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
34
1
0
10 Apr 2024
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting
  Applications
Improving Speaker Assignment in Speaker-Attributed ASR for Real Meeting Applications
Can Cui
Imran Ahmad Sheikh
Mostafa Sadeghi
Emmanuel Vincent
47
2
0
11 Mar 2024
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
SA-SOT: Speaker-Aware Serialized Output Training for Multi-Talker ASR
Zhiyun Fan
Linhao Dong
Jun Zhang
Lu Lu
Zejun Ma
43
5
0
04 Mar 2024
On Speaker Attribution with SURT
On Speaker Attribution with SURT
Desh Raj
Sanjeev Khudanpur
Matthew Maciejewski
Leibny Paola García-Perera
Daniel Povey
Sanjeev Khudanpur
34
3
0
28 Jan 2024
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
Jiawen Kang
Lingwei Meng
Mingyu Cui
Haohan Guo
Xixin Wu
Xunying Liu
Helen M. Meng
59
6
0
08 Jan 2024
Improved Long-Form Speech Recognition by Jointly Modeling the Primary
  and Non-primary Speakers
Improved Long-Form Speech Recognition by Jointly Modeling the Primary and Non-primary Speakers
Guru Prakash Arumugam
Shuo-yiin Chang
Tara N. Sainath
Rohit Prabhavalkar
Quan Wang
Shaan Bijwadia
29
3
0
18 Dec 2023
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Speaker Mask Transformer for Multi-talker Overlapped Speech Recognition
Peng Shen
Xugang Lu
Hisashi Kawai
35
1
0
18 Dec 2023
Extending Whisper with prompt tuning to target-speaker ASR
Extending Whisper with prompt tuning to target-speaker ASR
Hao Ma
Zhiyuan Peng
Mingjie Shao
Jing Li
Xuelong Li
VLM
38
13
0
13 Dec 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech
  Translation
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation
Juan Pablo Zuluaga
Zhaocheng Huang
Xing Niu
Rohit Paturi
S. Srinivasan
Prashant Mathur
Brian Thompson
Marcello Federico
BDL
35
2
0
01 Nov 2023
End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder
  and Input Feature Analysis
End-to-end Multichannel Speaker-Attributed ASR: Speaker Guided Decoder and Input Feature Analysis
Can Cui
Imran A. Sheikh
Mostafa Sadeghi
Emmanuel Vincent
29
4
0
16 Oct 2023
A Glance is Enough: Extract Target Sentence By Looking at A keyword
A Glance is Enough: Extract Target Sentence By Looking at A keyword
Ying Shi
Dong Wang
Lantian Li
Jiqing Han
38
1
0
09 Oct 2023
SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
SA-Paraformer: Non-autoregressive End-to-End Speaker-Attributed ASR
Yangze Li
Fan Yu
Yuhao Liang
Pengcheng Guo
Mohan Shi
Zhihao Du
Shiliang Zhang
Lei Xie
24
3
0
07 Oct 2023
One model to rule them all ? Towards End-to-End Joint Speaker
  Diarization and Speech Recognition
One model to rule them all ? Towards End-to-End Joint Speaker Diarization and Speech Recognition
Samuele Cornell
Jee-weon Jung
Shinji Watanabe
S. Squartini
VLM
32
16
0
02 Oct 2023
Enhancing End-to-End Conversational Speech Translation Through Target
  Language Context Utilization
Enhancing End-to-End Conversational Speech Translation Through Target Language Context Utilization
A. Hussein
Brian Yan
Antonios Anastasopoulos
Shinji Watanabe
Sanjeev Khudanpur
37
3
0
27 Sep 2023
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation
  Capability
t-SOT FNT: Streaming Multi-talker ASR with Text-only Domain Adaptation Capability
Jian Wu
Naoyuki Kanda
Takuya Yoshioka
Rui Zhao
Zhuo Chen
Jinyu Li
21
5
0
15 Sep 2023
Conformer-based Target-Speaker Automatic Speech Recognition for
  Single-Channel Audio
Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio
Yang Zhang
Krishna C. Puvvada
Vitaly Lavrukhin
Boris Ginsburg
38
14
0
09 Aug 2023
Exploring the Integration of Speech Separation and Recognition with
  Self-Supervised Learning Representation
Exploring the Integration of Speech Separation and Recognition with Self-Supervised Learning Representation
Yoshiki Masuyama
Xuankai Chang
Wangyou Zhang
Samuele Cornell
Zhongqiu Wang
Nobutaka Ono
Y. Qian
Shinji Watanabe
41
6
0
23 Jul 2023
MeetEval: A Toolkit for Computation of Word Error Rates for Meeting
  Transcription Systems
MeetEval: A Toolkit for Computation of Word Error Rates for Meeting Transcription Systems
Thilo von Neumann
Christoph Boeddeker
Marc Delcroix
Reinhold Haeb-Umbach
29
16
0
21 Jul 2023
Token-Level Serialized Output Training for Joint Streaming ASR and ST
  Leveraging Textual Alignments
Token-Level Serialized Output Training for Joint Streaming ASR and ST Leveraging Textual Alignments
Sara Papi
Peidong Wan
Junkun Chen
Jian Xue
Jinyu Li
Yashesh Gaur
28
8
0
07 Jul 2023
Mixture Encoder for Joint Speech Separation and Recognition
Mixture Encoder for Joint Speech Separation and Recognition
Simon Berger
Peter Vieting
Christoph Boeddeker
Ralf Schluter
Reinhold Häb-Umbach
26
6
0
21 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
34
9
0
18 Jun 2023
End-to-End Joint Target and Non-Target Speakers ASR
End-to-End Joint Target and Non-Target Speakers ASR
Ryo Masumura
Naoki Makishima
Taiga Yamane
Yoshihiko Yamazaki
Saki Mizuno
...
Akihiko Takashima
Satoshi Suzuki
Takafumi Moriya
Nobukatsu Hojo
Atsushi Ando
32
5
0
04 Jun 2023
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Adapting Multi-Lingual ASR Models for Handling Multiple Talkers
Chenda Li
Yao Qian
Zhuo Chen
Naoyuki Kanda
Dongmei Wang
Takuya Yoshioka
Y. Qian
Michael Zeng
37
11
0
30 May 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and
  Diarization with a Sidecar Separator
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Lingwei Meng
Jiawen Kang
Mingyu Cui
Haibin Wu
Xixin Wu
Helen M. Meng
39
10
0
25 May 2023
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
BA-SOT: Boundary-Aware Serialized Output Training for Multi-Talker ASR
Yuhao Liang
Fan Yu
Yangze Li
Pengcheng Guo
Shiliang Zhang
Qian Chen
Linfu Xie
33
8
0
23 May 2023
CASA-ASR: Context-Aware Speaker-Attributed ASR
CASA-ASR: Context-Aware Speaker-Attributed ASR
Mohan Shi
Zhihao Du
Qian Chen
Fan Yu
Yangze Li
Shiliang Zhang
Jie Zhang
Lirong Dai
36
8
0
21 May 2023
A Sidecar Separator Can Convert a Single-Talker Speech Recognition
  System to a Multi-Talker One
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
Lingwei Meng
Jiawen Kang
Mingyu Cui
Yuejiao Wang
Xixin Wu
Helen M. Meng
25
17
0
20 Feb 2023
GPU-accelerated Guided Source Separation for Meeting Transcription
GPU-accelerated Guided Source Separation for Meeting Transcription
Desh Raj
Daniel Povey
Sanjeev Khudanpur
26
35
0
10 Dec 2022
On Word Error Rate Definitions and their Efficient Computation for
  Multi-Speaker Speech Recognition Systems
On Word Error Rate Definitions and their Efficient Computation for Multi-Speaker Speech Recognition Systems
Thilo von Neumann
Christoph Boeddeker
K. Kinoshita
Marc Delcroix
Reinhold Haeb-Umbach
37
19
0
29 Nov 2022
12
Next