Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.08596
Cited By
v1
v2 (latest)
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
13 September 2024
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions"
45 / 45 papers shown
ELEGANCE: Efficient LLM Guidance for Audio-Visual Target Speech Extraction
Wenxuan Wu
Shuai Wang
Xixin Wu
Helen Meng
Haizhou Li
111
0
0
09 Nov 2025
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
Yejin Kwon
Taewoo Kang
Hyunsoo Yoon
Changouk Kim
AuLLM
ELM
LRM
219
0
0
22 Oct 2025
Thinking in cocktail party: Chain-of-Thought and reinforcement learning for target speaker automatic speech recognition
Y. Zhang
Hang Su
Lichun Fan
Zhenbo Luo
Jian Luan
LRM
90
0
0
19 Sep 2025
GLAD: Global-Local Aware Dynamic Mixture-of-Experts for Multi-Talker ASR
Yujie Guo
Jiaming Zhou
Yuhang Jia
Shiwan Zhao
Yong Qin
MoE
224
0
0
16 Sep 2025
UTI-LLM: A Personalized Articulatory-Speech Therapy Assistance System Based on Multimodal Large Language Model
Yudong Yang
Xiaokang Liu
Shaofeng zhao
Rongfeng Su
Nan Yan
Lan Wang
124
0
0
16 Sep 2025
PAC: Pronunciation-Aware Contextualized Large Language Model-based Automatic Speech Recognition
Li Fu
Yu Xin
Sunlu Zeng
Lu Fan
Youzheng Wu
Xiaodong He
133
0
0
16 Sep 2025
Serialized Output Prompting for Large Language Model-based Multi-Talker Speech Recognition
Hao Shi
Yusuke Fujita
Tomoya Mizumoto
Lianbo Liu
Atsushi Kojima
Yui Sudo
102
1
0
01 Sep 2025
Speaker Targeting via Self-Speaker Adaptation for Multi-talker ASR
Weiqing Wang
T. Park
Ivan Medennikov
Jinhan Wang
Kunal Dhawan
He Huang
Nithin Rao Koluguri
Jagadeesh Balam
Boris Ginsburg
202
1
0
27 Jun 2025
Incorporating Linguistic Constraints from External Knowledge Source for Audio-Visual Target Speech Extraction
Wenxuan Wu
Shuai Wang
Xixin Wu
Chao Yang
Haizhou Li
278
2
0
11 Jun 2025
Towards Reliable Large Audio Language Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ziyang Ma
Xiquan Li
Yakun Song
Wenxi Chen
Chenpeng Du
...
Yihao Chen
Zhuo Chen
Yuping Wang
Yuping Wang
Xie Chen
AuLLM
237
2
0
25 May 2025
Survey of End-to-End Multi-Speaker Automatic Speech Recognition for Monaural Audio
Xinlu He
Jacob Whitehill
215
4
0
16 May 2025
WorldSimBench: Towards Video Generation Models as World Simulators
Yiran Qin
Zhelun Shi
Jiwen Yu
Xijun Wang
Enshen Zhou
...
Lu Sheng
Jing Shao
Junlin Wu
Wanli Ouyang
Ruimao Zhang
EGVM
VGen
550
796
0
23 Oct 2024
Disentangling Speakers in Multi-Talker Speech Recognition with Speaker-Aware CTC
Jiawen Kang
Lingwei Meng
Mingyu Cui
Yuejiao Wang
Xixin Wu
Xunying Liu
Helen Meng
274
6
0
19 Sep 2024
The CHiME-8 DASR Challenge for Generalizable and Array Agnostic Distant Automatic Speech Recognition and Diarization
Samuele Cornell
Taejin Park
Steve Huang
Christoph Boeddeker
Xuankai Chang
Matthew Maciejewski
Sanjeev Khudanpur
Paola García
Shinji Watanabe
223
24
0
23 Jul 2024
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
285
380
0
15 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen M. Meng
Furu Wei
430
86
0
11 Jul 2024
Serialized Output Training by Learned Dominance
Ying Shi
Lantian Li
Shi Yin
D. Wang
Jiqing Han
142
7
0
04 Jul 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
218
106
0
31 Mar 2024
Cross-Speaker Encoding Network for Multi-Talker Speech Recognition
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Jiawen Kang
Lingwei Meng
Mingyu Cui
Haohan Guo
Xixin Wu
Xunying Liu
Helen M. Meng
163
9
0
08 Jan 2024
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
320
595
0
14 Nov 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
360
438
0
20 Oct 2023
Kosmos-G: Generating Images in Context with Multimodal Large Language Models
International Conference on Learning Representations (ICLR), 2023
Xichen Pan
Li Dong
Shaohan Huang
Zhiliang Peng
Wenhu Chen
Furu Wei
VLM
552
97
0
04 Oct 2023
Conformer-based Target-Speaker Automatic Speech Recognition for Single-Channel Audio
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Yang Zhang
Krishna C. Puvvada
Vitaly Lavrukhin
Boris Ginsburg
164
19
0
09 Aug 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron
Louis Martin
Kevin R. Stone
Peter Albert
Amjad Almahairi
...
Sharan Narang
Aurelien Rodriguez
Robert Stojnic
Sergey Edunov
Thomas Scialom
AI4MH
ALM
8.2K
15,302
0
18 Jul 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Interspeech (Interspeech), 2023
Yuan Gong
Sameer Khurana
Leonid Karlinsky
James R. Glass
187
109
0
06 Jul 2023
Kosmos-2: Grounding Multimodal Large Language Models to the World
International Conference on Learning Representations (ICLR), 2023
Zhiliang Peng
Wenhui Wang
Li Dong
Y. Hao
Shaohan Huang
Shuming Ma
Furu Wei
MLLM
ObjD
VLM
400
1,030
0
26 Jun 2023
SURT 2.0: Advances in Transducer-based Multi-talker Speech Recognition
IEEE/ACM Transactions on Audio Speech and Language Processing (TASLP), 2023
Desh Raj
Daniel Povey
Sanjeev Khudanpur
VLM
334
16
0
18 Jun 2023
End-to-End Joint Target and Non-Target Speakers ASR
Interspeech (Interspeech), 2023
Ryo Masumura
Naoki Makishima
Taiga Yamane
Yoshihiko Yamazaki
Saki Mizuno
...
Akihiko Takashima
Satoshi Suzuki
Takafumi Moriya
Nobukatsu Hojo
Atsushi Ando
110
7
0
04 Jun 2023
Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator
Interspeech (Interspeech), 2023
Lingwei Meng
Jiawen Kang
Mingyu Cui
Haibin Wu
Xixin Wu
Helen M. Meng
157
13
0
25 May 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
4.6K
20,902
0
15 Mar 2023
Language Is Not All You Need: Aligning Perception with Language Models
Neural Information Processing Systems (NeurIPS), 2023
Shaohan Huang
Li Dong
Wenhui Wang
Y. Hao
Saksham Singhal
...
Johan Bjorck
Vishrav Chaudhary
Subhojit Som
Xia Song
Furu Wei
VLM
LRM
MLLM
343
676
0
27 Feb 2023
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
6.4K
17,759
0
27 Feb 2023
A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Lingwei Meng
Jiawen Kang
Mingyu Cui
Yuejiao Wang
Xixin Wu
Helen M. Meng
192
21
0
20 Feb 2023
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
1.0K
5,793
0
06 Dec 2022
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
327
37
0
01 Nov 2022
Streaming Multi-Talker ASR with Token-Level Serialized Output Training
Interspeech (Interspeech), 2022
Naoyuki Kanda
Jian Wu
Yu Wu
Xiong Xiao
Zhong Meng
Xiaofei Wang
Yashesh Gaur
Zhuo Chen
Jinyu Li
Takuya Yoshioka
407
72
0
02 Feb 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
1.2K
2,674
0
26 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
International Conference on Learning Representations (ICLR), 2021
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
1.6K
15,460
0
17 Jun 2021
End-to-End Speaker-Attributed ASR with Transformer
Interspeech (Interspeech), 2021
Naoyuki Kanda
Guoli Ye
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Zhuo Chen
Takuya Yoshioka
183
56
0
05 Apr 2021
Streaming Multi-talker Speech Recognition with Joint Speaker Identification
Interspeech (Interspeech), 2021
Liang Lu
Naoyuki Kanda
Jinyu Li
Jiawei Liu
213
21
0
05 Apr 2021
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Interspeech (Interspeech), 2020
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
368
919
0
24 Jun 2020
Serialized Output Training for End-to-End Overlapped Speech Recognition
Interspeech (Interspeech), 2020
Naoyuki Kanda
Yashesh Gaur
Xiaofei Wang
Zhong Meng
Takuya Yoshioka
228
144
0
28 Mar 2020
End-to-End Multi-speaker Speech Recognition with Transformer
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2020
Xuankai Chang
Wangyou Zhang
Y. Qian
Jonathan Le Roux
Shinji Watanabe
ViT
280
112
0
10 Feb 2020
Parameter-Efficient Transfer Learning for NLP
International Conference on Machine Learning (ICML), 2019
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
631
5,677
0
02 Feb 2019
Permutation Invariant Training of Deep Models for Speaker-Independent Multi-talker Speech Separation
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2016
Dong Yu
Morten Kolbæk
Zheng-Hua Tan
Jesper Jensen
348
917
0
01 Jul 2016
1