ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.07919
  4. Cited By
Qwen-Audio: Advancing Universal Audio Understanding via Unified
  Large-Scale Audio-Language Models
v1v2 (latest)

Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models

14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
    AuLLM
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"

50 / 277 papers shown
DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
DrVoice: Parallel Speech-Text Voice Conversation Model via Dual-Resolution Speech Representations
Chao-Hong Tan
Qian Chen
Wen Wang
Chong Deng
Qinglin Zhang
...
Yafeng Chen
Hui Wang
Jiaqing Liu
Jieping Ye
Jieping Ye
AuLLM
275
0
0
24 Dec 2025
Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching
Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching
Wei Chee Yew
Hailun Xu
Sanjay Saha
Xiaotian Fan
Hiok Hian Ong
David Yuchen Wang
Kanchan Sarkar
Zhenheng Yang
Danhui Guan
65
0
0
03 Dec 2025
Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
Omni-AutoThink: Adaptive Multimodal Reasoning via Reinforcement Learning
Dongchao Yang
Songxiang Liu
Disong Wang
Yuanyuan Wang
Guanglu Wan
Helen Meng
OffRLLRM
175
0
0
03 Dec 2025
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
Boyu Zhu
Xiaofei Wen
Wenjie Mo
Tinghui Zhu
Yanan Xie
Peng Qi
Muhao Chen
79
0
0
02 Dec 2025
Spoken Conversational Agents with Large Language Models
Spoken Conversational Agents with Large Language Models
Chao-Han Huck Yang
A. Stolcke
Larry Heck
AuLLM
472
1
0
02 Dec 2025
MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages
MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages
Yexing Du
Kaiyuan Liu
Youcheng Pan
B. Yang
Keqi Deng
Xie Chen
Yang Xiang
Ming Liu
Bin Qin
Y. Wang
LRM
94
0
0
01 Dec 2025
OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion
OmniFusion: Simultaneous Multilingual Multimodal Translations via Modular Fusion
Sai Koneru
Matthias Huck
Jan Niehues
73
0
0
28 Nov 2025
HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding
HPSU: A Benchmark for Human-Level Perception in Real-World Spoken Speech Understanding
Chen Li
Peiji Yang
Yicheng Zhong
Jianxing Yu
Zhisheng Wang
Zihao Gou
Wenqing Chen
Jian Yin
VLM
151
0
0
28 Nov 2025
StereoDETR: Stereo-based Transformer for 3D Object Detection
StereoDETR: Stereo-based Transformer for 3D Object Detection
Shiyi Mu
Zichong Gu
Zhiqi Ai
Anqi Liu
Yilin Gao
Shugong Xu
ViT3DPC
151
0
0
24 Nov 2025
TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition
TiCAL:Typicality-Based Consistency-Aware Learning for Multimodal Emotion Recognition
Wen Yin
Siyu Zhan
Cencen Liu
Xin Hu
Guiduo Duan
Xiurui Xie
Yuan-Fang Li
Tao He
181
0
0
19 Nov 2025
PresentCoach: Dual-Agent Presentation Coaching through Exemplars and Interactive Feedback
PresentCoach: Dual-Agent Presentation Coaching through Exemplars and Interactive Feedback
Sirui Chen
Jinsong Zhou
Xinli Xu
Xiaoyu Yang
Litao Guo
Ying-Cong Chen
207
0
0
19 Nov 2025
Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs
Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs
Zhe Sun
Yujun Cai
Jiayu Yao
Yiwei Wang
AuLLMLRM
389
1
0
17 Nov 2025
Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models
Probing Preference Representations: A Multi-Dimensional Evaluation and Analysis Method for Reward Models
Chenglong Wang
Yifu Huo
Yang Gan
Yongyu Mu
Qiaozhi He
...
Tongran Liu
Anxiang Ma
Zhengtao Yu
Jingbo Zhu
Tong Xiao
104
0
0
16 Nov 2025
Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound
Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound
Dengming Zhang
W. You
Jingxiong Li
Weishen Lin
Wenda Shi
Xue Zhao
H. Zuo
Junxian Wu
Lingyun Sun
VLM
140
0
0
15 Nov 2025
When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning
When One Modality Sabotages the Others: A Diagnostic Lens on Multimodal Reasoning
Chenyu Zhang
Minsol Kim
Shohreh Ghorbani
Jingyao Wu
Rosalind Picard
Patricia Maes
Paul Pu Liang
137
2
0
04 Nov 2025
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
SeaLLMs-Audio: Large Audio-Language Models for Southeast Asia
Chaoqun Liu
Mahani Aljunied
Guizhen Chen
Hou Pong Chan
Weiwen Xu
Yu Rong
Wenxuan Zhang
AuLLM
326
2
0
03 Nov 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLMLRM
477
3
0
28 Oct 2025
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
393
4
0
26 Oct 2025
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
Jiajun Fan
Roger Ren
Jingyuan Li
R. Pandey
Prashanth Gurunath Shivakumar
I. Bulyko
Ankur Gandhe
Ge Liu
Yile Gu
LRM
141
1
0
23 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
136
0
0
22 Oct 2025
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
AMAuT: A Flexible and Efficient Multiview Audio Transformer Framework Trained from Scratch
Weichuang Shao
I. Liao
Tomas Henrique Bode Maul
T. Chandesa
108
1
0
22 Oct 2025
SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering
SARSteer: Safeguarding Large Audio Language Models via Safe-Ablated Refusal Steering
Weilin Lin
Jianze Li
Hui Xiong
Li Liu
LLMSV
233
1
0
20 Oct 2025
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye
Chao-Han Huck Yang
Arushi Goel
Wei Huang
Ligeng Zhu
...
Andrew Tao
Song Han
Jan Kautz
Hongxu Yin
Pavlo Molchanov
175
3
0
17 Oct 2025
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Extending Audio Context for Long-Form Understanding in Large Audio-Language Models
Yuatyong Chaichana
Pittawat Taveekitworachai
Warit Sirichotedumrong
Potsawee Manakul
Kunat Pipatanakul
AuLLM
144
0
0
17 Oct 2025
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
Qiyu Wu
Shuyang Cui
Satoshi Hayakawa
Wei-Yao Wang
Hiromi Wakaki
Yuki Mitsufuji
96
0
0
17 Oct 2025
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
NExT-OMNI: Towards Any-to-Any Omnimodal Foundation Models with Discrete Flow Matching
Run Luo
Xiaobo Xia
Lu Wang
Longze Chen
Renke Shan
Jing Luo
Min Yang
Tat-Seng Chua
VGen
240
4
0
15 Oct 2025
Not in Sync: Unveiling Temporal Bias in Audio Chat Models
Not in Sync: Unveiling Temporal Bias in Audio Chat Models
Jiayu Yao
Shenghua Liu
Yiwei Wang
Rundong Cheng
Lingrui Mei
Baolong Bi
Zhen Xiong
Xueqi Cheng
116
0
0
14 Oct 2025
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
SAIL-Embedding Technical Report: Omni-modal Embedding Foundation Model
Lin Lin
Jiefeng Long
Zhihe Wan
Y. Wang
Dingkang Yang
...
Yan Qiu
Haiyang Yu
Xiao Liang
Hongsheng Li
Chao Feng
236
3
0
14 Oct 2025
Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models
Understanding the Modality Gap: An Empirical Study on the Speech-Text Alignment Mechanism of Large Speech Language Models
Bajian Xiang
Shuaijiang Zhao
Tingwei Guo
Wei Zou
97
1
0
14 Oct 2025
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
VCB Bench: An Evaluation Benchmark for Audio-Grounded Large Language Model Conversational Agents
Jiliang Hu
Wenfu Wang
Zuchao Li
Chenxing Li
Yiyang Zhao
Hanzhao Li
Liqiang Zhang
Meng Yu
Dong Yu
ELM
141
2
0
13 Oct 2025
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
MRSAudio: A Large-Scale Multimodal Recorded Spatial Audio Dataset with Refined Annotations
Wenxiang Guo
Changhao Pan
Zhiyuan Zhu
Xintong Hu
Yu Zhang
...
Z. Chen
Yanhao Yu
Qiange Huang
Fei Wu
Zhou Zhao
223
0
0
12 Oct 2025
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Multimodal Prompt Optimization: Why Not Leverage Multiple Modalities for MLLMs
Yumin Choi
Dongki Kim
Jinheon Baek
Sung Ju Hwang
116
1
0
10 Oct 2025
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Peize He
Zichen Wen
Yubo Wang
Y. Wang
Xiaoqian Liu
...
Zhifei Liu
Weijia Li
C. Wang
Conghui He
Linfeng Zhang
AuLLM
187
3
0
08 Oct 2025
AURA Score: A Metric For Holistic Audio Question Answering Evaluation
AURA Score: A Metric For Holistic Audio Question Answering Evaluation
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
112
0
0
06 Oct 2025
Zephyrus: An Agentic Framework for Weather Science
Zephyrus: An Agentic Framework for Weather Science
Sumanth Varambally
Marshall Fisher
Jas Thakker
Yiwei Chen
Zhirui Xia
...
Salva Rühling Cachay
Taylor Berg-Kirkpatrick
Duncan Watson-Parris
Yi-An Ma
Rose Yu
LLMAG
120
2
0
05 Oct 2025
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
MoME: Mixture of Matryoshka Experts for Audio-Visual Speech Recognition
Umberto Cappellazzo
Minsu Kim
Pingchuan Ma
Honglie Chen
Xubo Liu
Stavros Petridis
Maja Pantic
MoE
155
0
0
05 Oct 2025
The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
The silence of the weights: an investigation of structural pruning strategies for attention-based audio signal architectures
Andrea Diecidue
C. Barbano
Piero Fraternali
Mathieu Fontaine
Enzo Tartaglione
68
0
0
30 Sep 2025
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
OWL: Geometry-Aware Spatial Reasoning for Audio Large Language Models
Subrata Biswas
Mohammad Nur Hossain Khan
Bashima Islam
VLMLRM
119
1
0
30 Sep 2025
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Guojian Li
C. Wang
Hongfei Xue
Shuiyuan Wang
Dehui Gao
...
Yuke Lin
W. Li
Longshuai Xiao
Zhonghua Fu
Lei Xie
101
0
0
28 Sep 2025
Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations
Evaluating Bias in Spoken Dialogue LLMs for Real-World Decisions and Recommendations
Y. Wu
Tianrui Wang
Yizhou Peng
Yi-Wen Chao
Xuyi Zhuang
Xinsheng Wang
Shunshun Yin
Ziyang Ma
158
0
0
27 Sep 2025
Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
Comprehend and Talk: Text to Speech Synthesis via Dual Language Modeling
Junjie Cao
Yichen Han
Ruonan Zhang
Xiaoyang Hao
Hongxiang Li
Shuaijiang Zhao
Yue Liu
Xiao-Ping Zhng
119
0
0
26 Sep 2025
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Zhichao Sheng
Shilin Zhou
Chen Gong
Zhenghua Li
AuLLMLRM
310
0
0
26 Sep 2025
CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges
CMDAR: A Chinese Multi-scene Dynamic Audio Reasoning Benchmark with Diverse Challenges
Hui Li
Changhao Jiang
Hongyu Wang
Ming Zhang
Jiajun Sun
...
Baoyu Fan
Changzhi Sun
Tao Gui
Qi Zhang
Xuanjing Huang
AuLLMELM
120
0
0
26 Sep 2025
Guiding Audio Editing with Audio Language Model
Guiding Audio Editing with Audio Language Model
Zitong Lan
Yiduo Hao
Mingmin Zhao
DiffMKELM
166
4
0
25 Sep 2025
Investigating Modality Contribution in Audio LLMs for Music
Investigating Modality Contribution in Audio LLMs for Music
G. Morais
Magdalena Fuentes
AuLLM
139
0
0
25 Sep 2025
Can Audio Large Language Models Verify Speaker Identity?
Can Audio Large Language Models Verify Speaker Identity?
Yiming Ren
Xuenan Xu
Baoxiang Li
Shuai Wang
Chao Zhang
53
0
0
24 Sep 2025
WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
WEST: LLM based Speech Toolkit for Speech Understanding, Generation, and Interaction
Binbin Zhang
Chengdong Liang
Shuai Wang
Xuelong Geng
Zhao Guo
...
Hao Yin
XiPeng Yang
Pengshen Zhang
Changwei Ma
Lei Xie
AuLLMVLM
444
0
0
24 Sep 2025
HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling
HarmoniFuse: A Component-Selective and Prompt-Adaptive Framework for Multi-Task Speech Language Modeling
Yuke Si
Runyan Yang
Yingying Gao
Junlan Feng
Chao Deng
Shilei Zhang
80
0
0
23 Sep 2025
STAR: Speech-to-Audio Generation via Representation Learning
STAR: Speech-to-Audio Generation via Representation Learning
Zeyu Xie
Xuenan Xu
Yixuan Li
Mengyue Wu
Yuexian Zou
104
0
0
21 Sep 2025
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
Interpretable Audio Editing Evaluation via Chain-of-Thought Difference-Commonality Reasoning with Multimodal LLMs
Yuhang Jia
Xu Zhang
Yang Chen
Hui Wang
Enzhi Wang
Yong Qin
LRM
124
0
0
21 Sep 2025
123456
Next