ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07919
  4. Cited By
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
v1v2 (latest)

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
    AuLLM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"

50 / 277 papers shown
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Retrieval-Augmented Speech Recognition Approach for Domain Challenges
Peng Shen
Xugang Lu
Hisashi Kawai
RALM
262
3
0
24 Feb 2025
Audio-FLAN: A Preliminary Release
Audio-FLAN: A Preliminary Release
Liumeng Xue
Ziya Zhou
J. Pan
Zhiyu Li
Shuai Fan
...
Haohe Liu
Emmanouil Benetos
Ge Zhang
Wenhan Luo
Wei Xue
MLLMAuLLMCLIPVLM
284
2
0
23 Feb 2025
Chain-of-Description: What I can understand, I can put into words
Chain-of-Description: What I can understand, I can put into words
Jiaxin Guo
Daimeng Wei
Tianying Wang
Hengchao Shang
Yuanchang Luo
Hao Yang
236
0
0
22 Feb 2025
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic Integration
EAGER-LLM: Enhancing Large Language Models as Recommenders through Exogenous Behavior-Semantic IntegrationThe Web Conference (WWW), 2025
Minjie Hong
Yan Xia
Liang Luo
Jieming Zhu
Ye Wang
...
Xiaoda Yang
Quanyu Dai
Zhenhua Dong
Zhimeng Zhang
Zhou Zhao
265
17
0
21 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a DayAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Gallil Maimon
Avishai Elmakies
Yossi Adi
317
9
0
19 Feb 2025
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic Embeddings
SEA: Low-Resource Safety Alignment for Multimodal Large Language Models via Synthetic EmbeddingsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Weikai Lu
Hao Peng
Huiping Zhuang
Cen Chen
Huiping Zhuang
271
5
0
18 Feb 2025
Soundwave: Less is More for Speech-Text Alignment in LLMs
Soundwave: Less is More for Speech-Text Alignment in LLMsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yunke Zhang
Zhiheng Liu
Fan Bu
Ruiyu Zhang
Benyou Wang
Haoyang Li
AuLLMSyDaVLM
278
7
0
18 Feb 2025
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Survey on AI-Generated Media Detection: From Non-MLLM to MLLM
Yueying Zou
Peipei Li
Zekun Li
Huaibo Huang
Xing Cui
Xuannan Liu
Chenghanyu Zhang
Ran He
DeLMO
695
11
0
07 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAMLAuLLM
556
5
0
02 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
341
14
0
28 Jan 2025
Baichuan-Omni-1.5 Technical Report
Yadong Li
Qingbin Liu
Tao Zhang
Tao Zhang
Tian Jin
...
Jianhua Xu
Haoze Sun
Mingan Lin
Guosheng Dong
Xin Wu
AuLLM
328
64
0
28 Jan 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning DataIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLMSyDa
387
39
0
28 Jan 2025
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
FireRedASR: Open-Source Industrial-Grade Mandarin Speech Recognition Models from Encoder-Decoder to LLM Integration
Kai-Tuo Xu
Feng-Long Xie
Xu Tang
Feng-Long Xie
411
40
0
24 Jan 2025
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
Xuelong Geng
Kun Wei
Qijie Shao
Shuiyun Liu
Zhennan Lin
...
Yuhang Dai
Xinfa Zhu
Yue Li
Li Zhang
Lei Xie
328
21
0
23 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond WordsNeural Information Processing Systems (NeurIPS), 2024
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jing Zhang
Lu Lu
Longji Xu
Haizhou Li
Zhikai Wu
AuLLM
396
49
0
17 Jan 2025
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Audio-CoT: Exploring Chain-of-Thought Reasoning in Large Audio Language Model
Tianhao Shen
Zhuo Chen
Longji Xu
Xiaofeng Wang
Xie Chen
AuLLMLRM
276
43
0
13 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A SurveyIEEE Access (IEEE Access), 2024
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
465
6
0
10 Jan 2025
"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?
"Yeah Right!" -- Do LLMs Exhibit Multimodal Feature Transfer?
Benjamin Z. Reichman
Kartik Talamadupula
326
0
0
07 Jan 2025
Prepending or Cross-Attention for Speech-to-Text? An Empirical Comparison
Prepending or Cross-Attention for Speech-to-Text? An Empirical ComparisonNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Tsz Kin Lam
Marco Gaido
Sara Papi
L. Bentivogli
Barry Haddow
433
3
0
04 Jan 2025
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Chun-Yi Kuan
Hung-yi Lee
AuLLMLRM
377
16
0
03 Jan 2025
OmniChat: Enhancing Spoken Dialogue Systems with Scalable Synthetic Data for Diverse Scenarios
Xize Cheng
Dongjie Fu
Xiaoda Yang
Minghui Fang
Ruofan Hu
...
Rongjie Huang
Linjun Li
Yu Chen
Tao Jin
Zhou Zhao
365
8
0
03 Jan 2025
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Chia-Yu Hung
Navonil Majumder
Zhifeng Kong
Ambuj Mehrish
Rafael Valle
Bryan Catanzaro
Soujanya Poria
Bryan Catanzaro
Soujanya Poria
376
36
0
30 Dec 2024
LLMs are Also Effective Embedding Models: An In-depth Overview
LLMs are Also Effective Embedding Models: An In-depth Overview
Chongyang Tao
Tao Shen
Shen Gao
Junshuo Zhang
Zhen Li
Kai Hua
Wenpeng Hu
Zhengwei Tao
Shuai Ma
396
27
0
17 Dec 2024
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for
  Long-term Streaming Video and Audio Interactions
InternLM-XComposer2.5-OmniLive: A Comprehensive Multimodal System for Long-term Streaming Video and Audio Interactions
Pan Zhang
Xiaoyi Dong
Yuhang Cao
Yuhang Zang
Rui Qian
...
Xinsong Zhang
Kai Chen
Yu Qiao
Dahua Lin
Jiaqi Wang
KELM
371
34
0
12 Dec 2024
Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages
Towards Building Large Scale Datasets and State-of-the-Art Automatic Speech Translation Systems for 14 Indian Languages
Sparsh Jain
Ashwin Sankar
Devilal Choudhary
Dhairya Suman
Nikhil Narasimhan
Mohammed Safi Ur Rahman Khan
Anoop Kunchukuttan
Mitesh M. Khapra
Mary Dabre
468
2
0
07 Nov 2024
Foundation Models for Rapid Autonomy Validation
Foundation Models for Rapid Autonomy ValidationIEEE International Conference on Robotics and Automation (ICRA), 2024
Alec Farid
Peter Schleede
Aaron Huang
Christoffer Heckman
373
0
0
22 Oct 2024
Roadmap towards Superhuman Speech Understanding using Large Language
  Models
Roadmap towards Superhuman Speech Understanding using Large Language Models
Fan Bu
Yuhao Zhang
Xiang Wang
Benyou Wang
Qiang Liu
Haoyang Li
LM&MAELMAuLLM
737
2
0
17 Oct 2024
OMCAT: Omni Context Aware Transformer
OMCAT: Omni Context Aware Transformer
Arushi Goel
Karan Sapra
Matthieu Le
Rafael Valle
Andrew Tao
Bryan Catanzaro
MLLMVLM
232
2
0
15 Oct 2024
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in
  Multimodal Large Language Models
MINER: Mining the Underlying Pattern of Modality-Specific Neurons in Multimodal Large Language Models
Kaichen Huang
Jiahao Huo
Yibo Yan
Kun Wang
Yutao Yue
Xuming Hu
244
2
0
07 Oct 2024
Self-Powered LLM Modality Expansion for Large Speech-Text Models
Self-Powered LLM Modality Expansion for Large Speech-Text ModelsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Tengfei Yu
Xuebo Liu
Zhiyi Hou
Liang Ding
Dacheng Tao
Min Zhang
227
5
0
04 Oct 2024
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Wonjune Kang
Junteng Jia
Chunyang Wu
Wei Zhou
Egor Lakomkin
...
Leda Sari
Suyoun Kim
Ke Li
Jay Mahadeokar
Ozlem Kalinli
AuLLM
302
14
0
02 Oct 2024
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum Learning
Making LLMs Better Many-to-Many Speech-to-Text Translators with Curriculum LearningAnnual Meeting of the Association for Computational Linguistics (ACL), 2024
Yexing Du
Youcheng Pan
Ziyang Ma
Keqi Deng
Yifan Yang
Keqi Deng
Xie Chen
Yang Xiang
Ming Liu
Bing Qin
LRM
516
9
0
29 Sep 2024
KALE-LM-Chem: Vision and Practice Toward an AI Brain for Chemistry
KALE-LM-Chem: Vision and Practice Toward an AI Brain for Chemistry
Weichen Dai
Yezeng Chen
Zijie Dai
Zhijie Huang
Wenshu Fan
...
Chengli Zhong
Xinhe Li
Zeyu Wang
Zhuoying Feng
Yi Zhou
269
0
0
27 Sep 2024
Internalizing ASR with Implicit Chain of Thought for Efficient
  Speech-to-Speech Conversational LLM
Internalizing ASR with Implicit Chain of Thought for Efficient Speech-to-Speech Conversational LLM
Robin Shing-Hei Yuen
Timothy Tin-Long Tse
Jian Zhu
AuLLM
180
4
0
25 Sep 2024
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
How to Connect Speech Foundation Models and Large Language Models? What Matters and What Does Not
Francesco Verdini
Pierfrancesco Melucci
Stefano Perna
Francesco Cariaggi
Marco Gaido
...
Marek Kasztelnik
L. Bentivogli
Sébastien Bratières
P. Merialdo
Simone Scardapane
AuLLM
313
2
0
25 Sep 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality EvaluationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLMLM&MA
377
22
0
25 Sep 2024
Boosting Code-Switching ASR with Mixture of Experts Enhanced
  Speech-Conditioned LLM
Boosting Code-Switching ASR with Mixture of Experts Enhanced Speech-Conditioned LLMIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Fengrun Zhang
Wang Geng
Hukai Huang
Cheng Yi
He Qu
He Qu
AuLLMMoE
183
9
0
24 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Y. Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
604
51
0
23 Sep 2024
What Are They Doing? Joint Audio-Speech Co-Reasoning
What Are They Doing? Joint Audio-Speech Co-ReasoningIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Mirco Ravanelli
AuLLM
253
5
0
22 Sep 2024
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for
  Multilingual Speech-to-Text
Ideal-LLM: Integrating Dual Encoders and Language-Adapted LLM for Multilingual Speech-to-Text
Hongfei Xue
Wei Ren
Xuelong Geng
Kun Wei
Longhao Li
Qijie Shao
Linju Yang
Kai Diao
Lei Xie
AuLLM
184
11
0
17 Sep 2024
Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora
Enhancing Code-switched Text-to-Speech Synthesis Capability in Large Language Models with only Monolingual Corpora
Jing Xu
Daxin Tan
J. Wang
Xiao Chen
214
0
0
17 Sep 2024
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
The Art of Storytelling: Multi-Agent Generative AI for Dynamic Multimodal Narratives
Samee Arif
Taimoor Arif
Muhammad Saad Haroon
Aamina Jamal Khan
Agha Ali Raza
Awais Athar
491
3
0
17 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
386
12
0
17 Sep 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
264
5
0
14 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile InstructionsIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
330
19
0
13 Sep 2024
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Multi-modal Speech Transformer Decoders: When Do Multiple Modalities Improve Accuracy?
Yiwen Guan
V. Trinh
Vivek Voleti
Jacob Whitehill
282
2
0
13 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model EvaluationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Gallil Maimon
Amit Roth
Yossi Adi
ELMAuLLM
414
20
0
11 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak EncodersIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Feiyu Xiong
Shuo Sun
Sijin Yu
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
Ai Ti Aw
AuLLM
511
3
0
10 Sep 2024
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Benchmarking Sub-Genre Classification For Mainstage Dance Music
Hongzhi Shu
Xinglin Li
Hongyu Jiang
Minghao Fu
Xinyu Li
150
1
0
10 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
352
121
0
10 Sep 2024
Previous
123456
Next