Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2409.20007
Cited By
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
28 January 2025
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Github
Papers citing
"DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"
50 / 53 papers shown
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Hardik B. Sailor
Aw Ai Ti
Chen Fang Yih Nancy
Chiu Ying Lay
Ding Yang
...
Wong Heng Meng Jeremy
Wu Jinyang
Zhang Huayun
Zhang Longyin
Zou Xunlong
AuLLM
493
0
0
07 Nov 2025
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
Pedro Corrêa
João Lima
Victor Moreno
Lucas Ueda
Paula D. P. Costa
AuLLM
602
3
0
29 Oct 2025
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
Chih-Kai Yang
Yen-Ting Piao
Tzu-wen Hsu
Szu-Wei Fu
Zhehuai Chen
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
KELM
AuLLM
220
3
0
19 Oct 2025
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
Bo-Han Feng
Chien-Feng Liu
Yu-Hsuan Li Liang
Chih-Kai Yang
Szu-Wei Fu
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
182
2
0
19 Oct 2025
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
Yi-Cheng Lin
Yu-Hua Chen
Jia-Kai Dong
Yueh-Hsuan Huang
Szu-Chi Chen
...
I-Ning Tsai
H. Wang
Ho-Lam Chung
Ke-Han Lu
Hung-yi Lee
AuLLM
VLM
215
1
0
30 Sep 2025
Dual Information Speech Language Models for Emotional Conversations
Chun Wang
Chenyang Liu
Wenze Xu
Weihong Deng
AuLLM
123
0
0
11 Aug 2025
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
Qiongqiong Wang
Hardik B. Sailor
Jeremy H.M Wong
Tianchi Liu
Shuo Sun
Wenyu Zhang
Muhammad Huzaifah
Nancy F. Chen
Ai Ti Aw
AuLLM
193
2
0
10 Aug 2025
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhen Wan
Chao-Han Huck Yang
Yahan Yu
Jinchuan Tian
Sheng Li
...
Zhehuai Chen
Shinji Watanabe
Fei Cheng
Chenhui Chu
Sadao Kurohashi
AuLLM
ELM
222
0
0
25 Jul 2025
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Sara Papi
Maike Züfle
Marco Gaido
Beatrice Savoldi
Danni Liu
Ioannis Douros
L. Bentivogli
Jan Niehues
358
7
0
25 Jul 2025
DIFFA: Large Language Diffusion Models Can Listen and Understand
Jiaming Zhou
Hongjie Chen
Shiwan Zhao
Jian Kang
Jie Li
...
Haoqin Sun
Hui Wang
Aobo Kong
Yong Qin
X. Li
294
9
0
24 Jul 2025
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
Hongjie Chen
Zehan Li
Yaodong Song
Wenming Deng
Yitong Yao
...
Chao Wang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AuLLM
VLM
338
4
0
24 Jul 2025
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
Tzu-wen Hsu
Ke-Han Lu
Cheng-Han Chiang
Hung-yi Lee
AuLLM
477
9
0
08 Jun 2025
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
Wenyu Zhang
Yingxu He
Geyu Lin
Zhuohan Liu
Shuo Sun
...
Jeremy H.M Wong
Qiongqiong Wang
Hardik B. Sailor
Nancy F. Chen
Ai Ti Aw
AuLLM
268
3
0
07 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
436
12
0
05 Jun 2025
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan
Hung-yi Lee
AuLLM
369
2
0
26 May 2025
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
Ke-Han Lu
Chun-Yi Kuan
Hung-yi Lee
AuLLM
ELM
351
25
0
25 May 2025
Towards Reliable Large Audio Language Model
Annual Meeting of the Association for Computational Linguistics (ACL), 2025
Ziyang Ma
Xiquan Li
Yakun Song
Wenxi Chen
Chenpeng Du
...
Yihao Chen
Zhuo Chen
Yuping Wang
Yuping Wang
Xie Chen
AuLLM
281
3
0
25 May 2025
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Chi-Yuan Hsiao
Ke-Han Lu
Kai-Wei Chang
Chih-Kai Yang
Wei-Chih Chen
Hung-yi Lee
CLL
MoMe
488
10
0
23 May 2025
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
Chun-Yi Kuan
Hung-yi Lee
391
7
0
20 May 2025
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
Chih-Kai Yang
Neo Ho
Yen-Ting Piao
Hung-yi Lee
AuLLM
LRM
630
27
0
19 May 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
475
106
0
11 Apr 2025
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Liang-Hsuan Tseng
Yi-Chang Chen
Kuan-Yi Lee
Da-shan Shiu
Hung-yi Lee
AuLLM
572
17
0
09 Apr 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Umberto Cappellazzo
Minsu Kim
Stavros Petridis
564
10
0
09 Mar 2025
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
457
542
0
15 Jul 2024
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
765
2,002
0
15 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
328
24
0
13 Jul 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
He Huang
Boris Ginsburg
Yu-Chiang Frank Wang
Hung-yi Lee
VLM
AuLLM
296
40
0
27 Jun 2024
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Chen Wang
Minpeng Liao
Zhongqiang Huang
Junhong Wu
Chengqing Zong
Jiajun Zhang
VLM
AuLLM
320
42
0
06 Jun 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
580
77
0
14 May 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
281
130
0
31 Mar 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MA
AuLLM
ALM
301
213
0
12 Feb 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDE
SLR
SSL
428
303
0
23 Dec 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
449
700
0
14 Nov 2023
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
North American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Ke Li
Junteng Jia
Shangguan Yuan
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
LM&MA
MLLM
AuLLM
326
69
0
12 Nov 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MA
AuLLM
490
529
0
20 Oct 2023
Joint Audio and Speech Understanding
Automatic Speech Recognition & Understanding (ASRU), 2023
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
614
126
0
25 Sep 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chien-yu Huang
Ke-Han Lu
Shi Wang
Chi-Yuan Hsiao
Chun-Yi Kuan
...
Roshan S. Sharma
Shinji Watanabe
Bhiksha Ramakrishnan
Shady Shehata
Hung-yi Lee
AuLLM
418
99
0
18 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
460
78
0
02 Sep 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Neural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
1.1K
8,135
0
29 May 2023
Listen, Think, and Understand
International Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELM
MLLM
LRM
834
241
0
18 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
963
1,475
0
17 May 2023
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
AAAI Conference on Artificial Intelligence (AAAI), 2023
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
...
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Zhou Zhao
Shinji Watanabe
LM&MA
AuLLM
285
376
0
25 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
5.3K
23,506
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
International Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
1.6K
7,784
0
30 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
International Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
1.4K
6,745
0
06 Dec 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhifang Guo
Yichong Leng
Yihan Wu
Sheng Zhao
Xuejiao Tan
DiffM
233
175
0
22 Nov 2022
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimation
Automatic Speech Recognition & Understanding (ASRU), 2022
Marvin Lavechin
Marianne Métais
Hadrien Titeux
Alodie Boissonnet
Jade Copet
M. Rivière
Elika Bergelson
Alejandrina Cristià
Emmanuel Dupoux
H. Bredin
418
40
0
24 Oct 2022
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Keon Lee
Kyumin Park
Daeyoung Kim
LM&MA
566
82
0
03 Jul 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Neural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
2.8K
17,183
0
28 Jan 2022
LoRA: Low-Rank Adaptation of Large Language Models
International Conference on Learning Representations (ICLR), 2021
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
1.9K
17,979
0
17 Jun 2021
1
2
Next
Page 1 of 2