v1v2 (latest)

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

14 November 2023

Yunfei Chu

Jin Xu

Xiaohuan Zhou

Qian Yang

Shiliang Zhang

Zhijie Yan

Chang Zhou

Jingren Zhou

AuLLM

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"

50 / 277 papers shown

Retrieval-Augmented Speech Recognition Approach for Domain Challenges

262

24 Feb 2025

Audio-FLAN: A Preliminary Release

...

284

23 Feb 2025

Chain-of-Description: What I can understand, I can put into words

236

22 Feb 2025

EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic IntegrationThe Web Conference (WWW), 2025

...

265

21 Feb 2025

Slamming: Training a Speech Language Model on One GPU in a DayAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

Gallil Maimon

Avishai Elmakies

Yossi Adi

317

19 Feb 2025

SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

271

18 Feb 2025

Soundwave: Less is More for Speech-Text Alignment in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

278

18 Feb 2025

Survey on AI-Generated Media Detection: From Non-MLLM to MLLM

695

07 Feb 2025

"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models

556

02 Feb 2025

Audio-Language Models for Audio-Centric Tasks: A survey

341

28 Jan 2025

Baichuan-Omni-1.5 Technical Report

Tao Zhang

...

328

28 Jan 2025

DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

387

28 Jan 2025

FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration

411

24 Jan 2025

OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia

...

328

23 Jan 2025

SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsNeural Information Processing Systems (NeurIPS), 2024

396

17 Jan 2025

Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model

276

13 Jan 2025

Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024

465

10 Jan 2025

"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?

Benjamin Z. Reichman

Kartik Talamadupula

326

07 Jan 2025

Prepending or Cross-Attention for Speech-to-Text? An Empirical ComparisonNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

433

04 Jan 2025

Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Chun-Yi Kuan

Hung-yi Lee

AuLLM LRM

377

03 Jan 2025

OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios

...

365

03 Jan 2025

TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization

Bryan Catanzaro

Soujanya Poria

376

30 Dec 2024

LLMs are Also Effective Embedding Models: An In-depth Overview

396

17 Dec 2024

InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions

...

371

12 Dec 2024

Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages

Mohammed Safi Ur Rahman Khan

Anoop Kunchukuttan

Mitesh M. Khapra

Mary Dabre

468

07 Nov 2024

Foundation Models for Rapid Autonomy ValidationIEEE International Conference on Robotics and Automation (ICRA), 2024

373

22 Oct 2024

Roadmap towards Superhuman Speech Understanding using Large Language Models

Yuhao Zhang

737

17 Oct 2024

OMCAT: Omni Context Aware Transformer

232

15 Oct 2024

MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models

Kun Wang

Xuming Hu

244

07 Oct 2024

Self-Powered LLM Modality Expansion for Large Speech-Text ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024

Tengfei Yu

Xuebo Liu

Zhiyi Hou

Liang Ding

Dacheng Tao

Min Zhang

227

04 Oct 2024

Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech

...

302

02 Oct 2024

Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024

516

29 Sep 2024

KALE-LM-Chem: Vision and Practice Toward an AI Brain for Chemistry

...

Xinhe Li

Yi Zhou

269

27 Sep 2024

Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM

180

25 Sep 2024

How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not

Francesco Verdini

Pierfrancesco Melucci

...

313

25 Sep 2024

Enabling Auditory Large Language Models for Automatic Speech Quality EvaluationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Siyin Wang

Wenyi Yu

Yudong Yang

Changli Tang

Yixuan Li

...

Jun Zhang

Guangzhi Sun

Lu Lu

Yuxuan Wang

Chao Zhang

AuLLM LM&MA

377

25 Sep 2024

Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLMIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Fengrun Zhang

Wang Geng

183

24 Sep 2024

OmniBench: Towards The Future of Universal Omni-Language Models

...

604

23 Sep 2024

What Are They Doing? Joint Audio-Speech Co-ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Mirco Ravanelli

253

22 Sep 2024

Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text

Hongfei Xue

Kun Wei

Qijie Shao

Lei Xie

184

17 Sep 2024

Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora

214

17 Sep 2024

The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives

491

17 Sep 2024

Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models

Potsawee Manakul

Guangzhi Sun

Warit Sirichotedumrong

Kasima Tharnpipitchai

Kunat Pipatanakul

AuLLM

386

17 Sep 2024

Towards Diverse and Efficient Audio Captioning via Diffusion Models

264

14 Sep 2024

Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile InstructionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Lingwei Meng

Shujie Hu

Jiawen Kang

Zhaoqing Li

Yuejiao Wang

Wenxuan Wu

Xixin Wu

Xunying Liu

Helen Meng

AuLLM

330

13 Sep 2024

Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?

Yiwen Guan

V. Trinh

Vivek Voleti

Jacob Whitehill

282

13 Sep 2024

Salmon: A Suite for Acoustic Language Model EvaluationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

414

11 Sep 2024

MoWE-Audio: Multitask AudioLLMs with Mixture of Weak EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

511

10 Sep 2024

Benchmarking Sub-Genre Classification For Mainstage Dance Music

150

10 Sep 2024

LLaMA-Omni: Seamless Speech Interaction with Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Qingkai Fang

Shoutao Guo

Yan Zhou

Zhengrui Ma

Shaolei Zhang

Yang Feng

AuLLM

352

121

10 Sep 2024