v1v2 (latest)

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

14 November 2023

Yunfei Chu

Jin Xu

Xiaohuan Zhou

Qian Yang

Shiliang Zhang

Zhijie Yan

Chang Zhou

Jingren Zhou

AuLLM

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"

50 / 277 papers shown

DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations

...

275

24 Dec 2025

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching

03 Dec 2025

Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning

175

03 Dec 2025

OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning

02 Dec 2025

Spoken Conversational Agents with Large Language Models

472

02 Dec 2025

MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages

01 Dec 2025

OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion

Sai Koneru

Matthias Huck

Jan Niehues

28 Nov 2025

HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding

151

28 Nov 2025

StereoDETR: Stereo-based Transformer for 3D Object Detection

151

24 Nov 2025

TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition

181

19 Nov 2025

PresentCoach: Dual-Agent Presentation Coaching through Exemplars and Interactive Feedback

207

19 Nov 2025

Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs

389

17 Nov 2025

Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models

...

104

16 Nov 2025

Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound

140

15 Nov 2025

When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning

137

04 Nov 2025

SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia

326

03 Nov 2025

STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence

...

477

28 Oct 2025

Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation

...

393

26 Oct 2025

Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards

Prashanth Gurunath Shivakumar

141

23 Oct 2025

Data-Centric Lessons To Improve Speech-Language Pretraining

136

22 Oct 2025

AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch

Weichuang Shao

I. Liao

Tomas Henrique Bode Maul

T. Chandesa

108

22 Oct 2025

SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering

233

20 Oct 2025

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

...

175

17 Oct 2025

Extending Audio Context for Long-Form Understanding in Large Audio-Language Models

Yuatyong Chaichana

Pittawat Taveekitworachai

Warit Sirichotedumrong

Potsawee Manakul

Kunat Pipatanakul

AuLLM

144

17 Oct 2025

MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval

17 Oct 2025

NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching

240

15 Oct 2025

Not in Sync: Unveiling Temporal Bias in Audio Chat Models

116

14 Oct 2025

SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model

...

236

14 Oct 2025

Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models

14 Oct 2025

VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents

141

13 Oct 2025

MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations

...

223

12 Oct 2025

Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs

116

10 Oct 2025

AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs

...

187

08 Oct 2025

AURA Score: A Metric For Holistic Audio Question Answering Evaluation

Satvik Dixit

Soham Deshmukh

Bhiksha Raj

112

06 Oct 2025

Zephyrus: An Agentic Framework for Weather Science

...

Taylor Berg-Kirkpatrick

120

05 Oct 2025

MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition

155

05 Oct 2025

The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures

30 Sep 2025

OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models

Subrata Biswas

Mohammad Nur Hossain Khan

Bashima Islam

VLM LRM

119

30 Sep 2025

Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems

...

101

28 Sep 2025

Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations

158

27 Sep 2025

Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling

119

26 Sep 2025

Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models

310

26 Sep 2025

CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges

...

120

26 Sep 2025

Guiding Audio Editing with Audio Language Model

166

25 Sep 2025

Investigating Modality Contribution in Audio LLMs for Music

G. Morais

Magdalena Fuentes

AuLLM

139

25 Sep 2025

Can Audio Large Language Models Verify Speaker Identity?

24 Sep 2025

WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction

...

444

24 Sep 2025

HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling

23 Sep 2025

STAR: Speech-to-Audio Generation via Representation Learning

104

21 Sep 2025

Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs

124

21 Sep 2025