v1v2v3v4 (latest)

Recent Advances in Speech Language Models: A Survey

Annual Meeting of the Association for Computational Linguistics (ACL), 2024

1 October 2024

ArXiv (abs)PDF HTML Github (184★)

Papers citing "Recent Advances in Speech Language Models: A Survey"

50 / 165 papers shown

PURE Codec: Progressive Unfolding of Residual Entropy for Speech Codec Learning

192

27 Nov 2025

StereoDETR: Stereo-based Transformer for 3D Object Detection

222

24 Nov 2025

EchoMind: An Interrelated Multi-level Benchmark for Evaluating Empathetic Speech Language Models

221

26 Oct 2025

UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models

181

26 Oct 2025

Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models

...

164

10 Oct 2025

Can Speech LLMs Think while Listening?

214

08 Oct 2025

TokenChain: A Discrete Speech Chain via Semantic Token Modeling

Mingxuan Wang

Satoshi Nakamura

AI4CE LRM

132

07 Oct 2025

SD-MVSum: Script-Driven Multimodal Video Summarization Method and Datasets

Manolis Mylonas

Charalampia Zerva

Evlampios Apostolidis

Vasileios Mezaris

242

07 Oct 2025

When Voice Matters: Evidence of Gender Disparity in Positional Bias of SpeechLLMs

Shree Harsha Bokkahalli Satish

G. Henter

Éva Székely

355

01 Oct 2025

Game-Time: Evaluating Temporal Dynamics in Spoken Language Models

325

30 Sep 2025

Acoustic-based Gender Differentiation in Speech-aware Language Models

199

25 Sep 2025

From Turn-Taking to Synchronous Dialogue: A Survey of Full-Duplex Spoken Language Models

Yuxuan Chen

Haoyuan Yu

AuLLM

206

18 Sep 2025

AU-Harness: An Open-Source Toolkit for Holistic Evaluation of Audio LLMs

Sathwik Tejaswi Madhusudhan

AuLLM ELM

234

09 Sep 2025

VStyle: A Benchmark for Voice Style Adaptation with Spoken Instructions

...

255

09 Sep 2025

Say More with Less: Variable-Frame-Rate Speech Tokenization via Adaptive Clustering and Implicit Duration Coding

339

04 Sep 2025

Speech DF Arena: A Leaderboard for Speech DeepFake Detection Models

124

02 Sep 2025

Mic Drop or Data Flop? Evaluating the Fitness for Purpose of AI Voice Interviewers for Data Collection within Quantitative & Qualitative Research Contexts

147

01 Sep 2025

Speech Discrete Tokens or Continuous Features? A Comparative Analysis for Spoken Language Understanding in SpeechLLMs

196

25 Aug 2025

TurnGuide: Enhancing Meaningful Full Duplex Spoken Interactions via Dynamic Turn-Level Text-Speech Interleaving

229

10 Aug 2025

C3: A Bilingual Benchmark for Spoken Dialogue Models Exploring Challenges in Complex Conversations

359

30 Jul 2025

OpenS2S: Advancing Fully Open-Source End-to-End Empathetic Large Speech Language Model

...

259

07 Jul 2025

Perspective on Utilizing Foundation Models for Laboratory Automation in Materials Research

177

14 Jun 2025

Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model

...

435

10 Jun 2025

Intelligibility of Text-to-Speech Systems for Mathematical Expressions

Subhadip Bandyopadhyay

Siddhanth Iyengar

303

05 Jun 2025

From Alignment to Advancement: Bootstrapping Audio-Language Alignment with Synthetic Data

Chun-Yi Kuan

Hung-yi Lee

AuLLM

368

26 May 2025

Voice of a Continent: Mapping Africa's Speech Technology Frontier

AbdelRahim Elmadany

S. Kwon

Hawau Olamide Toyin

Alcides Alcoba Inciarte

Hanan Aldarmaki

Muhammad Abdul-Mageed

329

24 May 2025

Speechless: Speech Instruction Training Without Speech for Low Resource Languages

Warren Keng Hoong Low

Eng Siong Chng

J. Yip

SyDa

431

23 May 2025

Towards Spoken Mathematical Reasoning: Benchmarking Speech-based Models over Multi-faceted Math Problems

366

21 May 2025

AudioJailbreak: Jailbreak Attacks against End-to-End Large Audio-Language Models

Weiping Tu

Yuhong Yang

Bo Du

AuLLM AAML

587

20 May 2025

Benchmarking and Confidence Evaluation of LALMs For Temporal Reasoning

Debarpan Bhattacharya

Apoorva Kulkarni

Sriram Ganapathy

451

19 May 2025

LLaMA-Omni2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech SynthesisAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

279

05 May 2025

SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech TranslationAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

378

22 Apr 2025

On The Landscape of Spoken Language Models: A Comprehensive Survey

474

106

11 Apr 2025

Scaling Analysis of Interleaved Speech-Text Language Models

497

03 Apr 2025

From TOWER to SPIRE: Adding the Speech Modality to a Translation-Specialist LLM

454

13 Mar 2025

Slamming: Training a Speech Language Model on One GPU in a DayAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Gallil Maimon

Avishai Elmakies

Yossi Adi

401

19 Feb 2025

Audio-Language Models for Audio-Centric Tasks: A Systematic Survey

445

25 Jan 2025

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsNeural Information Processing Systems (NeurIPS), 2024

528

17 Jan 2025

Continuous Speech Tokens Makes LLMs Robust Multi-Modality Learners

360

06 Dec 2024

SALMONN-omni: A Codec-free LLM for Full-duplex Speech Understanding and Generation

395

27 Nov 2024

Dynamic-SUPERB Phase-2: A Collaboratively Expanding Benchmark for Measuring the Capabilities of Spoken Language Models with 180 Tasks

...

298

08 Nov 2024

Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI FeedbackAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

Guan-Ting Lin

Prashanth Gurunath Shivakumar

438

04 Nov 2024

Freeze-Omni: A Smart and Low Latency Speech-to-speech Dialogue Model with Frozen LLM

Chaoyou Fu

Ke Li

Long Ma

523

128

01 Nov 2024

...

729

3,723

25 Oct 2024

MMAU: A Massive Multi-Task Audio Understanding and Reasoning BenchmarkInternational Conference on Learning Representations (ICLR), 2024

Ramaneswaran Selvakumar

434

220

24 Oct 2024

OmniFlatten: An End-to-end GPT Model for Seamless Voice Conversation

...

Zhihao Du

Shiliang Zhang

SyDa BDL AuLLM VLM

514

23 Oct 2024

VoiceBench: Benchmarking LLM-Based Voice Assistants

509

151

22 Oct 2024

What Do Speech Foundation Models Not Learn About Speech?

Abdul Waheed

259

16 Oct 2024

IntrinsicVoice: Empowering LLMs with Intrinsic Real-time Voice Interaction Abilities

Xin Zhang

Xiang Lyu

Zhihao Du

Qian Chen

Dong Zhang

...

Yuxuan Wang

384

09 Oct 2024

EMOVA: Empowering Language Models to See, Hear and Speak with Vivid EmotionsComputer Vision and Pattern Recognition (CVPR), 2024

Kai Chen

Zhili Liu

...

Jun Yao

550

26 Sep 2024