ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.03751
  4. Cited By
Recent Advances in Speech Language Models: A Survey
v1v2v3v4 (latest)

Recent Advances in Speech Language Models: A Survey

Annual Meeting of the Association for Computational Linguistics (ACL), 2024
1 October 2024
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
    AuLLM
ArXiv (abs)PDFHTMLGithub (184★)

Papers citing "Recent Advances in Speech Language Models: A Survey"

50 / 165 papers shown
PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning
J. Shi
H. Wang
William Chen
Chenda Li
Wangyou Zhang
Jinchuan Tian
Shinji Watanabe
192
0
0
27 Nov 2025
StereoDETR: Stereo-based Transformer for 3D Object Detection
StereoDETR: Stereo-based Transformer for 3D Object Detection
Shiyi Mu
Zichong Gu
Zhiqi Ai
Anqi Liu
Yilin Gao
Shugong Xu
ViT3DPC
222
0
0
24 Nov 2025
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models
Li Zhou
Lutong Yu
You Lyu
Yihang Lin
Zefeng Zhao
Junyi Ao
Yuhao Zhang
Benyou Wang
Haizhou Li
AuLLM
221
1
0
26 Oct 2025
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Wenming Tu
Guanrou Yang
Ruiqi Yan
Wenxi Chen
Ziyang Ma
Yipeng Kang
Kai Yu
Xie Chen
Zilong Zheng
181
1
0
26 Oct 2025
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
Donghang Wu
H. Zhang
Jun Chen
Xiangyu
Zhang
...
Fei Tian
Xuerui Yang
Xiangyu Zhang
Daxin Jiang
Gang Yu
ReLMLRM
164
3
0
10 Oct 2025
Can Speech LLMs Think while Listening?
Can Speech LLMs Think while Listening?
Yi-Jen Shih
Desh Raj
Chunyang Wu
Wei Zhou
SK Bong
Yashesh Gaur
Jay Mahadeokar
Ozlem Kalinli
M. Seltzer
LRM
214
6
0
08 Oct 2025
TokenChain: A Discrete Speech Chain via Semantic Token Modeling
TokenChain: A Discrete Speech Chain via Semantic Token Modeling
Mingxuan Wang
Satoshi Nakamura
AI4CELRM
132
0
0
07 Oct 2025
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets
Manolis Mylonas
Charalampia Zerva
Evlampios Apostolidis
Vasileios Mezaris
242
6
0
07 Oct 2025
When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs
Shree Harsha Bokkahalli Satish
G. Henter
Éva Székely
355
2
0
01 Oct 2025
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
Kai-Wei Chang
En-Pei Hu
Chun-Yi Kuan
Wenze Ren
Wei-Chih Chen
Guan-Ting Lin
Yu Tsao
Shao-Hua Sun
Hung-yi Lee
James R. Glass
AuLLM
325
9
0
30 Sep 2025
Acoustic-based Gender Differentiation in Speech-aware Language Models
Acoustic-based Gender Differentiation in Speech-aware Language Models
Junhyuk Choi
Jihwan Seol
Nayeon Kim
Chanhee Cho
EunBin Cho
Bugeun Kim
AuLLM
199
1
0
25 Sep 2025
From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models
From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models
Yuxuan Chen
Haoyuan Yu
AuLLM
206
2
0
18 Sep 2025
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs
Sidharth Surapaneni
Hoang Nguyen
Jash Mehta
Aman Tiwari
Oluwanifemi Bamgbose
Akshay Kalkunte
Sai Rajeswar
Sathwik Tejaswi Madhusudhan
AuLLMELM
234
1
0
09 Sep 2025
VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions
Jun Zhan
Mingyang Han
Yuxuan Xie
Chen Wang
Dong Zhang
...
Qinyuan Cheng
Shimin Li
Jun Song
Xipeng Qiu
Bo Zheng
255
6
0
09 Sep 2025
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding
Rui Zheng
Wenrui Liu
Hui-Peng Du
Qinglin Zhang
Chong Deng
Qian Chen
Wen Wang
Yang Ai
Zhen-Hua Ling
339
4
0
04 Sep 2025
Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models
Sandipana Dowerah
Atharva Kulkarni
Ajinkya Kulkarni
Hoan My Tran
Joonas Kalda
Artem Fedorchenko
Benoit Fauve
Damien Lolive
Tanel Alumae
Matthew Magimai Doss
ELM
124
10
0
02 Sep 2025
Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts
Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts
Shreyas Tirumala
Nishant Jain
Danny D. Leybzon
Trent D. Buskirk
147
1
0
01 Sep 2025
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs
Dingdong Wang
Junan Li
Mingyu Cui
Dongchao Yang
Xueyuan Chen
Chao Yang
AuLLM
196
6
0
25 Aug 2025
TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving
TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving
Wenqian Cui
Lei Zhu
Xiaohui Li
Zhihan Guo
Haoli Bai
Lu Hou
Irwin King
229
1
0
10 Aug 2025
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations
Chengqian Ma
Wei Tao
Yiwen Guo
AuLLM
359
6
0
30 Jul 2025
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model
Chen Wang
Tianyu Peng
Wen Yang
Yinan Bai
Guangfu Wang
...
Lanpeng Jia
Lingxiang Wu
Jinqiao Wang
Chengqing Zong
Jiajun Zhang
AuLLMVLM
259
5
0
07 Jul 2025
Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research
Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research
Kan Hatakeyama-Sato
Toshihiko Nishida
Kenta Kitamura
Yoshitaka Ushiku
Koichi Takahashi
Y. Nabae
T. Hayakawa
AI4CE
177
1
0
14 Jun 2025
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang
B. Li
Bruce Wang
Boyong Wu
Chao Yan
...
X. Zhang
Yibo Zhu
Daxin Jiang
Shuchang Zhou
Chen-Hao Hu
AuLLM
435
8
0
10 Jun 2025
Intelligibility of Text-to-Speech Systems for Mathematical Expressions
Intelligibility of Text-to-Speech Systems for Mathematical Expressions
Sujoy Roychowdhury
H. G. Ranjani
Sumit Soman
Nishtha Paul
Subhadip Bandyopadhyay
Siddhanth Iyengar
303
3
0
05 Jun 2025
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data
Chun-Yi Kuan
Hung-yi Lee
AuLLM
368
2
0
26 May 2025
Voice of a Continent: Mapping Africa's Speech Technology Frontier
Voice of a Continent: Mapping Africa's Speech Technology Frontier
AbdelRahim Elmadany
S. Kwon
Hawau Olamide Toyin
Alcides Alcoba Inciarte
Hanan Aldarmaki
Muhammad Abdul-Mageed
329
0
0
24 May 2025
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
SyDa
431
4
0
23 May 2025
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems
Chengwei Wei
Bin Wang
Jung-jae Kim
Nancy F. Chen
AuLLMReLMLRM
366
9
0
21 May 2025
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models
Guangke Chen
Fu Song
Zhe Zhao
Xiaojun Jia
Yang Liu
Yanchen Qiao
Weizhe Zhang
Weiping Tu
Yuhong Yang
Bo Du
AuLLMAAML
587
12
0
20 May 2025
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning
Debarpan Bhattacharya
Apoorva Kulkarni
Sriram Ganapathy
451
4
0
19 May 2025
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis
LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Qingkai Fang
Yan Zhou
Shoutao Guo
Shaolei Zhang
Yang Feng
AuLLM
279
62
0
05 May 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Keqi Deng
Wenxi Chen
Xie Chen
P. Woodland
378
4
0
22 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
474
106
0
11 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
497
8
0
03 Apr 2025
From TOWER to SPIRE: Adding the Speech Modality to a Translation-Specialist LLM
From TOWER to SPIRE: Adding the Speech Modality to a Translation-Specialist LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
Marcely Zanon Boito
454
3
0
13 Mar 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a DayAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Gallil Maimon
Avishai Elmakies
Yossi Adi
401
13
0
19 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
Audio-Language Models for Audio-Centric Tasks: A Systematic Survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
LM&MAAuLLM
445
15
0
25 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsNeural Information Processing Systems (NeurIPS), 2024
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jing Zhang
Lu Lu
Longji Xu
Haizhou Li
Zhikai Wu
AuLLM
528
60
0
17 Jan 2025
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners
Ze Yuan
Yanqing Liu
Shujie Liu
Sheng Zhao
AuLLM
360
8
0
06 Dec 2024
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and
  Generation
SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation
Wenyi Yu
Siyin Wang
Xiaoyu Yang
Xianzhao Chen
Xiaohai Tian
Jing Zhang
Guangzhi Sun
Lu Lu
Longji Xu
Chao Zhang
AuLLM
395
29
0
27 Nov 2024
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks
Chien-yu Huang
Wei-Chih Chen
Shu-Wen Yang
Andy T. Liu
Chen-An Li
...
David Harwath
Shinji Watanabe
Hung-yi Lee
Shinji Watanabe
Hung-yi Lee
ELMAuLLM
298
24
0
08 Nov 2024
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
438
31
0
04 Nov 2024
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model
  with Frozen LLM
Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM
Xiong Wang
Yangze Li
Chaoyou Fu
Chunjiang Ge
Lei Xie
Ke Li
Xing Sun
Long Ma
AuLLMMLLM
523
128
0
01 Nov 2024
GPT-4o System Card
GPT-4o System Card
OpenAI OpenAI
:
Aaron Hurst
Adam Lerer
Adam P. Goucher
...
Yuchen He
Yuchen Zhang
Yujia Jin
Yunxing Dai
Yury Malkov
MLLM
729
3,723
0
25 Oct 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
MMAU: A Massive Multi-Task Audio Understanding and Reasoning BenchmarkInternational Conference on Learning Representations (ICLR), 2024
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLMELM
434
220
0
24 Oct 2024
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation
Qinglin Zhang
Luyao Cheng
Chong Deng
Qian Chen
Wen Wang
...
Jiaqing Liu
Hai Yu
Chaohong Tan
Zhihao Du
Shiliang Zhang
SyDaBDLAuLLMVLM
514
50
0
23 Oct 2024
VoiceBench: Benchmarking LLM-Based Voice Assistants
VoiceBench: Benchmarking LLM-Based Voice Assistants
Yiming Chen
Xianghu Yue
Chen Zhang
Xiaoxue Gao
R. Tan
Haoyang Li
ELMAuLLM
509
151
0
22 Oct 2024
What Do Speech Foundation Models Not Learn About Speech?
What Do Speech Foundation Models Not Learn About Speech?
Abdul Waheed
Hanin Atwany
Bhiksha Raj
Rita Singh
SSL
259
8
0
16 Oct 2024
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice
  Interaction Abilities
IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities
Xin Zhang
Xiang Lyu
Zhihao Du
Qian Chen
Dong Zhang
...
Yuxuan Wang
Bin Zhang
Heng Lu
Yaqian Zhou
Jiaqi Leng
AuLLM
384
16
0
09 Oct 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
550
56
0
26 Sep 2024
1234
Next
Page 1 of 4