Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2503.20215
Cited By
Qwen2.5-Omni Technical Report
26 March 2025
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
S. Bai
Keqin Chen
Jialin Wang
Yang Fan
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (164 upvotes)
Papers citing
"Qwen2.5-Omni Technical Report"
50 / 246 papers shown
Evaluating Multimodal Large Language Models on Core Music Perception Tasks
Brandon James Carone
Iran R. Roman
Pablo Ripollés
LRM
165
1
0
25 Oct 2025
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
Jiajun Fan
Roger Ren
Jingyuan Li
R. Pandey
Prashanth Gurunath Shivakumar
I. Bulyko
Ankur Gandhe
Ge Liu
Yile Gu
LRM
216
1
0
23 Oct 2025
Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment
Zhiyu Lin
Jingwen Yang
Jiale Zhao
Meng Liu
Sunzhu Li
Benyou Wang
133
0
0
23 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
146
0
0
22 Oct 2025
Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models
Jiajun Fan
Tong Wei
Chaoran Cheng
Yuxin Chen
Ge Liu
121
1
0
20 Oct 2025
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
Bo-Han Feng
Chien-Feng Liu
Yu-Hsuan Li Liang
Chih-Kai Yang
Szu-Wei Fu
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
144
0
0
19 Oct 2025
Hallucination Benchmark for Speech Foundation Models
Alkis Koudounas
Moreno La Quatra
Manuel Giollo
Sabato Marco Siniscalchi
Elena Baralis
HILM
299
1
0
18 Oct 2025
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
Qiyu Wu
Shuyang Cui
Satoshi Hayakawa
Wei-Yao Wang
Hiromi Wakaki
Yuki Mitsufuji
108
0
0
17 Oct 2025
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye
Chao-Han Huck Yang
Arushi Goel
Wei Huang
Ligeng Zhu
...
Andrew Tao
Song Han
Jan Kautz
Hongxu Yin
Pavlo Molchanov
210
9
0
17 Oct 2025
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Xingrui Wang
Jiang Liu
Chao Huang
X. Yu
Ze Wang
Ximeng Sun
Jialian Wu
Alan Yuille
Emad Barsoum
Zicheng Liu
VLM
106
0
0
16 Oct 2025
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
Hui Wang
J. Zhao
Cheng Liu
Yuhang Jia
Haoqin Sun
Jiaming Zhou
Yong Qin
Yong Qin
182
2
0
16 Oct 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Zhenyu Liu
Yunxin Li
Xuanyu Zhang
Qixun Teng
Shenyuan Jiang
...
Mingjun Zhao
Yu-Syuan Xu
Yancheng He
Baotian Hu
Min Zhang
AuLLM
MoE
282
1
0
15 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLM
AuLLM
VGen
VLM
446
7
0
15 Oct 2025
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Ziyang Ma
Ruiyang Xu
Zhenghao Xing
Yunfei Chu
Yuping Wang
...
Pheng-Ann Heng
Kai Yu
Junyang Lin
Eng Siong Chng
Xie Chen
VLM
101
8
0
14 Oct 2025
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&Ro
AIFin
AI4TS
LRM
AI4CE
268
9
0
13 Oct 2025
Scaling Language-Centric Omnimodal Representation Learning
Chenghao Xiao
Hou Pong Chan
Hao Zhang
Weiwen Xu
Mahani Aljunied
Yu Rong
170
3
0
13 Oct 2025
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Jinchuan Tian
Sang-gil Lee
Zhifeng Kong
Sreyan Ghosh
Arushi Goel
...
Shinji Watanabe
Mohammad Shoeybi
Bryan Catanzaro
Rafael Valle
Wei Ping
AuLLM
LRM
314
4
0
13 Oct 2025
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Caorui Li
Yu Chen
Yiyan Ji
Jin Xu
Zhenyu Cui
...
Minghao Liu
Junran Peng
Zhaoxiang Zhang
Jiaheng Liu
Jiaheng Liu
AuLLM
LRM
186
11
0
12 Oct 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen
Yue Ding
Weihong Lin
Jingyun Hua
Linli Yao
...
Yuanxing Zhang
Qiang Liu
Pengfei Wan
Liang Wang
Tieniu Tan
272
8
0
12 Oct 2025
MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Hongwei Chen
Yishu Lei
Dan Zhang
Bo Ke
Danxiang Zhu
...
Shikun Feng
Jingzhou He
Yu Sun
Hua Wu
Haifeng Wang
ReLM
LRM
155
1
0
11 Oct 2025
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
Donghang Wu
H. Zhang
Jun Chen
Xiangyu
Zhang
...
Fei Tian
Xuerui Yang
Xiangyu Zhang
Daxin Jiang
Gang Yu
ReLM
LRM
128
3
0
10 Oct 2025
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
Andong Deng
Taojiannan Yang
S. Yu
Lincoln Spencer
Mohit Bansal
Chen Chen
Serena Yeung-Levy
Xiaohan Wang
LRM
140
3
0
09 Oct 2025
Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection
Xiaodan Li
Mengjie Wu
Yao Zhu
Yunna Lv
YueFeng Chen
Cen Chen
Jianmei Guo
H. Xue
KELM
193
0
0
09 Oct 2025
CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Y. Wang
Yu Wang
119
2
0
09 Oct 2025
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Peize He
Zichen Wen
Yubo Wang
Y. Wang
Xiaoqian Liu
...
Zhifei Liu
Weijia Li
C. Wang
Conghui He
Linfeng Zhang
AuLLM
203
5
0
08 Oct 2025
Local MAP Sampling for Diffusion Models
Shaorong Zhang
Rob Brekelmans
Greg Ver Steeg
163
2
0
07 Oct 2025
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
Chengzhi Liu
Yuzhe Yang
Kaiwen Zhou
Zhen Zhang
Yue Fan
Yannan Xie
Peng Qi
Xin Eric Wang
157
2
0
07 Oct 2025
Robustness assessment of large audio language models in multiple-choice evaluation
F. López
Santosh Kesiraju
Jordi Luque
AuLLM
ELM
189
0
0
06 Oct 2025
Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing
Xuanhua Yin
Runkai Zhao
Weidong Cai
AI4CE
201
1
0
06 Oct 2025
Human Behavior Atlas: Benchmarking Unified Psychological and Social Behavior Understanding
Keane Ong
Wei Dai
Carol Li
Dewei Feng
Hengzhi Li
...
Jiaee Cheong
Rui Mao
G. Mengaldo
Erik Cambria
Paul Pu Liang
184
3
0
06 Oct 2025
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan
Xihua Wang
Zhengfeng Lai
Xin Cheng
Peng Zhang
Xiaojiang Liu
Ruihua Song
Meng Cao
DiffM
296
4
0
03 Oct 2025
AudioToolAgent: An Agentic Framework for Audio-Language Models
Gijs Wijngaard
Elia Formisano
M. Dumontier
Jenia Jitsev
LLMAG
AuLLM
187
2
0
03 Oct 2025
Self-Improvement in Multimodal Large Language Models: A Survey
Shijian Deng
Kai Wang
Tianyu Yang
Harsh Singh
Yapeng Tian
LRM
158
3
0
03 Oct 2025
Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Siddhant Arora
Haidar Khan
Kai Sun
Xin Luna Dong
Sajal Choudhary
...
Anuj Kumar
Ahmed Aly
Yue Liu
Florian Metze
Zhaojiang Lin
193
4
0
02 Oct 2025
Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
Siddhant Arora
Jinchuan Tian
Hayato Futami
Jiatong Shi
Yosuke Kashiwagi
E. Tsunoo
Shinji Watanabe
LRM
112
2
0
02 Oct 2025
From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling
Yifei Cao
Changhao Jiang
Jiabao Zhuang
Jiajun Sun
Ming Zhang
...
Yunke Zhang
Man Lan
Tao Gui
Qi Zhang
Xuanjing Huang
116
2
0
01 Oct 2025
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
Chen-An Li
Tzu-Han Lin
Hung-yi Lee
AuLLM
167
2
0
01 Oct 2025
Hearing the Order: Investigating Position Bias in Large Audio-Language Models
Yu-Xiang Lin
Chen-An Li
Sheng-Lun Wei
Po-Chun Chen
Hsin-Hsi Chen
Hung-yi Lee
155
0
0
01 Oct 2025
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
Kai-Wei Chang
En-Pei Hu
Chun-Yi Kuan
Wenze Ren
Wei-Chih Chen
Guan-Ting Lin
Yu Tsao
Shao-Hua Sun
Hung-yi Lee
James R. Glass
AuLLM
297
7
0
30 Sep 2025
Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap
Yueqian Lin
Zhengmian Hu
Qinsi Wang
Yudong Liu
H. Zhang
Jayakumar Subramanian
N. Vlassis
Hai Helen Li
Yiran Chen
LRM
125
3
0
30 Sep 2025
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
Zhengpeng Shi
Hengli Li
Yanpeng Zhao
Jianqun Zhou
Yuxuan Wang
Qinrong Cui
Wei Bi
Songchun Zhu
Bo Zhao
Zilong Zheng
VLM
139
0
0
30 Sep 2025
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Chengyao Wang
Zhisheng Zhong
Bohao Peng
Senqiao Yang
Yuqi Liu
Haokun Gui
Bin Xia
Jingyao Li
Bei Yu
Jiaya Jia
MLLM
AuLLM
VLM
178
3
0
29 Sep 2025
Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey
Yuntao Shou
Tao Meng
Wei Ai
Keqin Li
LRM
216
7
0
29 Sep 2025
FreeRet: MLLMs as Training-Free Retrievers
Yuhan Zhu
Xiangyu Zeng
Chenting Wang
Xinhao Li
Yicheng Xu
Ziang Yan
Yi Wang
Limin Wang
OffRL
VLM
LRM
201
2
0
29 Sep 2025
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Yixuan Zhou
Guoyang Zeng
Xin Liu
Xiang Li
Renjie Yu
...
Weiyue Sun
Jiancheng Gui
Kehan Li
Z. Wu
Zhiyuan Liu
177
7
0
29 Sep 2025
Plug-and-Play Emotion Graphs for Compositional Prompting in Zero-Shot Speech Emotion Recognition
Jiacheng Shi
Hongfei Du
Y. Alicia Hong
Ye Gao
ReLM
LRM
73
0
0
29 Sep 2025
Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis
Chao Wang
Rui Zheng
Yang Ai
Zhen-Hua Ling
103
0
0
28 Sep 2025
DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding
Guanghao Li
Zhihui Fu
Min Fang
Qibin Zhao
Ming Tang
Chun Yuan
Jun Wang
160
6
0
28 Sep 2025
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Guojian Li
C. Wang
Hongfei Xue
Shuiyuan Wang
Dehui Gao
...
Yuke Lin
W. Li
Longshuai Xiao
Zhonghua Fu
Lei Xie
116
2
0
28 Sep 2025
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Yucheng Wang
Yifan Hou
Aydin Javadov
Mubashara Akhtar
Mrinmaya Sachan
LRM
151
0
0
28 Sep 2025
Previous
1
2
3
4
5
Next
Page 2 of 5