ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.20007
  4. Cited By
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data

IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
28 January 2025
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
    AuLLMSyDa
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)Github

Papers citing "DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data"

50 / 53 papers shown
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
MERaLiON-SER: Robust Speech Emotion Recognition Model for English and SEA Languages
Hardik B. Sailor
Aw Ai Ti
Chen Fang Yih Nancy
Chiu Ying Lay
Ding Yang
...
Wong Heng Meng Jeremy
Wu Jinyang
Zhang Huayun
Zhang Longyin
Zou Xunlong
AuLLM
493
0
0
07 Nov 2025
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
Evaluating Emotion Recognition in Spoken Language Models on Emotionally Incongruent Speech
Pedro Corrêa
João Lima
Victor Moreno
Lucas Ueda
Paula D. P. Costa
AuLLM
602
3
0
29 Oct 2025
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
SAKE: Towards Editing Auditory Attribute Knowledge of Large Audio-Language Models
Chih-Kai Yang
Yen-Ting Piao
Tzu-wen Hsu
Szu-Wei Fu
Zhehuai Chen
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
KELMAuLLM
220
3
0
19 Oct 2025
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
Bo-Han Feng
Chien-Feng Liu
Yu-Hsuan Li Liang
Chih-Kai Yang
Szu-Wei Fu
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
182
2
0
19 Oct 2025
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
TAU: A Benchmark for Cultural Sound Understanding Beyond Semantics
Yi-Cheng Lin
Yu-Hua Chen
Jia-Kai Dong
Yueh-Hsuan Huang
Szu-Chi Chen
...
I-Ning Tsai
H. Wang
Ho-Lam Chung
Ke-Han Lu
Hung-yi Lee
AuLLMVLM
215
1
0
30 Sep 2025
Dual Information Speech Language Models for Emotional Conversations
Dual Information Speech Language Models for Emotional Conversations
Chun Wang
Chenyang Liu
Wenze Xu
Weihong Deng
AuLLM
123
0
0
11 Aug 2025
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
Incorporating Contextual Paralinguistic Understanding in Large Speech-Language Models
Qiongqiong Wang
Hardik B. Sailor
Jeremy H.M Wong
Tianchi Liu
Shuo Sun
Wenyu Zhang
Muhammad Huzaifah
Nancy F. Chen
Ai Ti Aw
AuLLM
193
2
0
10 Aug 2025
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language Models
SpeechIQ: Speech-Agentic Intelligence Quotient Across Cognitive Levels in Voice Understanding by Large Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Zhen Wan
Chao-Han Huck Yang
Yahan Yu
Jinchuan Tian
Sheng Li
...
Zhehuai Chen
Shinji Watanabe
Fei Cheng
Chenhui Chu
Sadao Kurohashi
AuLLMELM
222
0
0
25 Jul 2025
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
MCIF: Multimodal Crosslingual Instruction-Following Benchmark from Scientific Talks
Sara Papi
Maike Züfle
Marco Gaido
Beatrice Savoldi
Danni Liu
Ioannis Douros
L. Bentivogli
Jan Niehues
358
7
0
25 Jul 2025
DIFFA: Large Language Diffusion Models Can Listen and Understand
DIFFA: Large Language Diffusion Models Can Listen and Understand
Jiaming Zhou
Hongjie Chen
Shiwan Zhao
Jian Kang
Jie Li
...
Haoqin Sun
Hui Wang
Aobo Kong
Yong Qin
X. Li
294
9
0
24 Jul 2025
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
Hongjie Chen
Zehan Li
Yaodong Song
Wenming Deng
Yitong Yao
...
Chao Wang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AuLLMVLM
338
4
0
24 Jul 2025
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
Reducing Object Hallucination in Large Audio-Language Models via Audio-Aware Decoding
Tzu-wen Hsu
Ke-Han Lu
Cheng-Han Chiang
Hung-yi Lee
AuLLM
477
9
0
08 Jun 2025
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
Beyond Classification: Towards Speech Emotion Reasoning with Multitask AudioLLMs
Wenyu Zhang
Yingxu He
Geyu Lin
Zhuohan Liu
Shuo Sun
...
Jeremy H.M Wong
Qiongqiong Wang
Hardik B. Sailor
Nancy F. Chen
Ai Ti Aw
AuLLM
268
3
0
07 Jun 2025
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
AudioLens: A Closer Look at Auditory Attribute Perception of Large Audio-Language Models
Chih-Kai Yang
Neo Ho
Yi-Jyun Lee
Hung-yi Lee
AuLLM
436
12
0
05 Jun 2025
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan
Hung-yi Lee
AuLLM
369
2
0
26 May 2025
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
Speech-IFEval: Evaluating Instruction-Following and Quantifying Catastrophic Forgetting in Speech-Aware Language Models
Ke-Han Lu
Chun-Yi Kuan
Hung-yi Lee
AuLLMELM
351
25
0
25 May 2025
Towards Reliable Large Audio Language Model
Towards Reliable Large Audio Language ModelAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Ziyang Ma
Xiquan Li
Yakun Song
Wenxi Chen
Chenpeng Du
...
Yihao Chen
Zhuo Chen
Yuping Wang
Yuping Wang
Xie Chen
AuLLM
281
3
0
25 May 2025
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Chi-Yuan Hsiao
Ke-Han Lu
Kai-Wei Chang
Chih-Kai Yang
Wei-Chih Chen
Hung-yi Lee
CLLMoMe
488
10
0
23 May 2025
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
Teaching Audio-Aware Large Language Models What Does Not Hear: Mitigating Hallucinations through Synthesized Negative Samples
Chun-Yi Kuan
Hung-yi Lee
391
7
0
20 May 2025
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
SAKURA: On the Multi-hop Reasoning of Large Audio-Language Models Based on Speech and Audio Information
Chih-Kai Yang
Neo Ho
Yen-Ting Piao
Hung-yi Lee
AuLLMLRM
630
27
0
19 May 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
475
106
0
11 Apr 2025
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
TASTE: Text-Aligned Speech Tokenization and Embedding for Spoken Language Modeling
Liang-Hsuan Tseng
Yi-Chang Chen
Kuan-Yi Lee
Da-shan Shiu
Hung-yi Lee
AuLLM
572
17
0
09 Apr 2025
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Adaptive Audio-Visual Speech Recognition via Matryoshka-Based Multimodal LLMs
Umberto Cappellazzo
Minsu Kim
Stavros Petridis
564
10
0
09 Mar 2025
Qwen2-Audio Technical Report
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLMVLM
457
542
0
15 Jul 2024
Qwen2 Technical Report
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLMVLMMU
765
2,002
0
15 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing
  via Task Decomposition, Modularization, and Program Generation
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
328
24
0
13 Jul 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text
  Alignment
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
He Huang
Boris Ginsburg
Yu-Chiang Frank Wang
Hung-yi Lee
VLMAuLLM
296
40
0
27 Jun 2024
BLSP-Emo: Towards Empathetic Large Speech-Language Models
BLSP-Emo: Towards Empathetic Large Speech-Language Models
Chen Wang
Minpeng Liao
Zhongqiang Huang
Junhong Wu
Chengqing Zong
Jiajun Zhang
VLMAuLLM
320
42
0
06 Jun 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
580
77
0
14 May 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
281
130
0
31 Mar 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative
  Comprehension
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MAAuLLMALM
301
213
0
12 Feb 2024
emotion2vec: Self-Supervised Pre-Training for Speech Emotion
  Representation
emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation
Ziyang Ma
Zhisheng Zheng
Jiaxin Ye
Jinchao Li
Zhifu Gao
Shiliang Zhang
Xie Chen
MDESLRSSL
428
303
0
23 Dec 2023
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
449
700
0
14 Nov 2023
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMs
AudioChatLlama: Towards General-Purpose Speech Abilities for LLMsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2023
Yassir Fathullah
Chunyang Wu
Egor Lakomkin
Ke Li
Junteng Jia
Shangguan Yuan
Jay Mahadeokar
Ozlem Kalinli
Christian Fuegen
Michael Seltzer
LM&MAMLLMAuLLM
326
69
0
12 Nov 2023
SALMONN: Towards Generic Hearing Abilities for Large Language Models
SALMONN: Towards Generic Hearing Abilities for Large Language Models
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
LM&MAAuLLM
490
529
0
20 Oct 2023
Joint Audio and Speech Understanding
Joint Audio and Speech UnderstandingAutomatic Speech Recognition & Understanding (ASRU), 2023
Yuan Gong
Alexander H. Liu
Hongyin Luo
Leonid Karlinsky
James R. Glass
AuLLM
614
126
0
25 Sep 2023
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive
  Instruction-Tuning Benchmark for Speech
Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Chien-yu Huang
Ke-Han Lu
Shi Wang
Chi-Yuan Hsiao
Chun-Yi Kuan
...
Roshan S. Sharma
Shinji Watanabe
Bhiksha Ramakrishnan
Shady Shehata
Hung-yi Lee
AuLLM
418
99
0
18 Sep 2023
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment
  of Continuation Writing
BLSP: Bootstrapping Language-Speech Pre-training via Behavior Alignment of Continuation Writing
Chen Wang
Minpeng Liao
Zhongqiang Huang
Jinliang Lu
Junhong Wu
Yuchen Liu
Chengqing Zong
Jiajun Zhang
AuLLM
460
78
0
02 Sep 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward ModelNeural Information Processing Systems (NeurIPS), 2023
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
1.1K
8,135
0
29 May 2023
Listen, Think, and Understand
Listen, Think, and UnderstandInternational Conference on Learning Representations (ICLR), 2023
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
834
241
0
18 May 2023
PaLM 2 Technical Report
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLMLRM
963
1,475
0
17 May 2023
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking
  Head
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking HeadAAAI Conference on Artificial Intelligence (AAAI), 2023
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
...
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Zhou Zhao
Shinji Watanabe
LM&MAAuLLM
285
376
0
25 Apr 2023
GPT-4 Technical Report
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAGMLLM
5.3K
23,506
0
15 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language ModelsInternational Conference on Machine Learning (ICML), 2023
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLMMLLM
1.6K
7,784
0
30 Jan 2023
Robust Speech Recognition via Large-Scale Weak Supervision
Robust Speech Recognition via Large-Scale Weak SupervisionInternational Conference on Machine Learning (ICML), 2022
Alec Radford
Jong Wook Kim
Tao Xu
Greg Brockman
C. McLeavey
Ilya Sutskever
OffRL
1.4K
6,745
0
06 Dec 2022
PromptTTS: Controllable Text-to-Speech with Text Descriptions
PromptTTS: Controllable Text-to-Speech with Text DescriptionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Zhifang Guo
Yichong Leng
Yihan Wu
Sheng Zhao
Xuejiao Tan
DiffM
233
175
0
22 Nov 2022
Brouhaha: multi-task training for voice activity detection,
  speech-to-noise ratio, and C50 room acoustics estimation
Brouhaha: multi-task training for voice activity detection, speech-to-noise ratio, and C50 room acoustics estimationAutomatic Speech Recognition & Understanding (ASRU), 2022
Marvin Lavechin
Marianne Métais
Hadrien Titeux
Alodie Boissonnet
Jade Copet
M. Rivière
Elika Bergelson
Alejandrina Cristià
Emmanuel Dupoux
H. Bredin
418
40
0
24 Oct 2022
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-Speech
DailyTalk: Spoken Dialogue Dataset for Conversational Text-to-SpeechIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2022
Keon Lee
Kyumin Park
Daeyoung Kim
LM&MA
566
82
0
03 Jul 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language ModelsNeural Information Processing Systems (NeurIPS), 2022
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&RoLRMAI4CEReLM
2.8K
17,183
0
28 Jan 2022
LoRA: Low-Rank Adaptation of Large Language Models
LoRA: Low-Rank Adaptation of Large Language ModelsInternational Conference on Learning Representations (ICLR), 2021
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRLAI4TSAI4CEALMAIMat
1.9K
17,979
0
17 Jun 2021
12
Next
Page 1 of 2