Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2503.20215
Cited By
Qwen2.5-Omni Technical Report
26 March 2025
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
S. Bai
Keqin Chen
Jialin Wang
Yang Fan
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (164 upvotes)
Papers citing
"Qwen2.5-Omni Technical Report"
50 / 243 papers shown
Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment
Zhiyu Lin
Jingwen Yang
Jiale Zhao
Meng Liu
Sunzhu Li
Benyou Wang
112
0
0
23 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
140
0
0
22 Oct 2025
Adaptive Divergence Regularized Policy Optimization for Fine-tuning Generative Models
Jiajun Fan
Tong Wei
Chaoran Cheng
Yuxin Chen
Ge Liu
106
1
0
20 Oct 2025
Investigating Safety Vulnerabilities of Large Audio-Language Models Under Speaker Emotional Variations
Bo-Han Feng
Chien-Feng Liu
Yu-Hsuan Li Liang
Chih-Kai Yang
Szu-Wei Fu
...
Sung-Feng Huang
Chao-Han Huck Yang
Y. Wang
Yun-Nung Chen
Hung-yi Lee
136
0
0
19 Oct 2025
Hallucination Benchmark for Speech Foundation Models
Alkis Koudounas
Moreno La Quatra
Manuel Giollo
Sabato Marco Siniscalchi
Elena Baralis
HILM
275
1
0
18 Oct 2025
MCA: Modality Composition Awareness for Robust Composed Multimodal Retrieval
Qiyu Wu
Shuyang Cui
Satoshi Hayakawa
Wei-Yao Wang
Hiromi Wakaki
Yuki Mitsufuji
104
0
0
17 Oct 2025
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM
Hanrong Ye
Chao-Han Huck Yang
Arushi Goel
Wei Huang
Ligeng Zhu
...
Andrew Tao
Song Han
Jan Kautz
Hongxu Yin
Pavlo Molchanov
194
5
0
17 Oct 2025
XModBench: Benchmarking Cross-Modal Capabilities and Consistency in Omni-Language Models
Xingrui Wang
Jiang Liu
Chao Huang
X. Yu
Ze Wang
Ximeng Sun
Jialian Wu
Alan Yuille
Emad Barsoum
Zicheng Liu
VLM
101
0
0
16 Oct 2025
AudioEval: Automatic Dual-Perspective and Multi-Dimensional Evaluation of Text-to-Audio-Generation
Hui Wang
J. Zhao
Cheng Liu
Yuhang Jia
Haoqin Sun
Jiaming Zhou
Yong Qin
Yong Qin
152
2
0
16 Oct 2025
UniMoE-Audio: Unified Speech and Music Generation with Dynamic-Capacity MoE
Zhenyu Liu
Yunxin Li
Xuanyu Zhang
Qixun Teng
Shenyuan Jiang
...
Mingjun Zhao
Yu-Syuan Xu
Yancheng He
Baotian Hu
Min Zhang
AuLLM
MoE
267
1
0
15 Oct 2025
InteractiveOmni: A Unified Omni-modal Model for Audio-Visual Multi-turn Dialogue
Wenwen Tong
Hewei Guo
Dongchuan Ran
Jiangnan Chen
Jiefan Lu
...
Dinghao Zhou
Guiping Zhong
Ken Zheng
Shiyin Kang
Lewei Lu
MLLM
AuLLM
VGen
VLM
433
4
0
15 Oct 2025
Omni-Captioner: Data Pipeline, Models, and Benchmark for Omni Detailed Perception
Ziyang Ma
Ruiyang Xu
Zhenghao Xing
Yunfei Chu
Yuping Wang
...
Pheng-Ann Heng
Kai Yu
Junyang Lin
Eng Siong Chng
Xie Chen
VLM
98
3
0
14 Oct 2025
UALM: Unified Audio Language Model for Understanding, Generation and Reasoning
Jinchuan Tian
Sang-gil Lee
Zhifeng Kong
Sreyan Ghosh
Arushi Goel
...
Shinji Watanabe
Mohammad Shoeybi
Bryan Catanzaro
Rafael Valle
Wei Ping
AuLLM
LRM
290
1
0
13 Oct 2025
Scaling Language-Centric Omnimodal Representation Learning
Chenghao Xiao
Hou Pong Chan
Hao Zhang
Weiwen Xu
Mahani Aljunied
Yu Rong
152
0
0
13 Oct 2025
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&Ro
AIFin
AI4TS
LRM
AI4CE
250
6
0
13 Oct 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen
Yue Ding
Weihong Lin
Jingyun Hua
Linli Yao
...
Yuanxing Zhang
Qiang Liu
Pengfei Wan
Liang Wang
Tieniu Tan
265
3
0
12 Oct 2025
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Caorui Li
Yu Chen
Yiyan Ji
Jin Xu
Zhenyu Cui
...
Zili Wang
Minghao Liu
Junran Peng
Zhaoxiang Zhang
Jiaheng Liu
AuLLM
LRM
162
9
0
12 Oct 2025
MatryoshkaThinking: Recursive Test-Time Scaling Enables Efficient Reasoning
Hongwei Chen
Yishu Lei
Dan Zhang
Bo Ke
Danxiang Zhu
...
Shikun Feng
Jingzhou He
Yu Sun
Hua Wu
Haifeng Wang
ReLM
LRM
140
0
0
11 Oct 2025
Mind-Paced Speaking: A Dual-Brain Approach to Real-Time Reasoning in Spoken Language Models
Donghang Wu
H. Zhang
Jun Chen
Xiangyu
Zhang
...
Fei Tian
Xuerui Yang
Xiangyu Zhang
Daxin Jiang
Gang Yu
ReLM
LRM
121
3
0
10 Oct 2025
CS3-Bench: Evaluating and Enhancing Speech-to-Speech LLMs for Mandarin-English Code-Switching
Heyang Liu
Yuhao Wang
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Y. Wang
Yu Wang
104
2
0
09 Oct 2025
Kelp: A Streaming Safeguard for Large Models via Latent Dynamics-Guided Risk Detection
Xiaodan Li
Mengjie Wu
Yao Zhu
Yunna Lv
YueFeng Chen
Cen Chen
Jianmei Guo
H. Xue
KELM
179
0
0
09 Oct 2025
SciVideoBench: Benchmarking Scientific Video Reasoning in Large Multimodal Models
Andong Deng
Taojiannan Yang
S. Yu
Lincoln Spencer
Mohit Bansal
Chen Chen
Serena Yeung-Levy
Xiaohan Wang
LRM
135
3
0
09 Oct 2025
AudioMarathon: A Comprehensive Benchmark for Long-Context Audio Understanding and Efficiency in Audio LLMs
Peize He
Zichen Wen
Yubo Wang
Y. Wang
Xiaoqian Liu
...
Zhifei Liu
Weijia Li
C. Wang
Conghui He
Linfeng Zhang
AuLLM
195
3
0
08 Oct 2025
Presenting a Paper is an Art: Self-Improvement Aesthetic Agents for Academic Presentations
Chengzhi Liu
Yuzhe Yang
Kaiwen Zhou
Zhen Zhang
Yue Fan
Yannan Xie
Peng Qi
Xin Eric Wang
147
1
0
07 Oct 2025
Local MAP Sampling for Diffusion Models
Shaorong Zhang
Rob Brekelmans
Greg Ver Steeg
153
1
0
07 Oct 2025
Human Behavior Atlas: Benchmarking Unified Psychological and Social Behavior Understanding
Keane Ong
Wei Dai
Carol Li
Dewei Feng
Hengzhi Li
...
Jiaee Cheong
Rui Mao
G. Mengaldo
Erik Cambria
Paul Pu Liang
173
2
0
06 Oct 2025
Robustness assessment of large audio language models in multiple-choice evaluation
F. López
Santosh Kesiraju
Jordi Luque
AuLLM
ELM
181
0
0
06 Oct 2025
Improving Multimodal Brain Encoding Model with Dynamic Subject-awareness Routing
Xuanhua Yin
Runkai Zhao
Weidong Cai
AI4CE
183
0
0
06 Oct 2025
AudioToolAgent: An Agentic Framework for Audio-Language Models
Gijs Wijngaard
Elia Formisano
M. Dumontier
LLMAG
AuLLM
140
0
0
03 Oct 2025
Self-Improvement in Multimodal Large Language Models: A Survey
Shijian Deng
Kai Wang
Tianyu Yang
Harsh Singh
Yapeng Tian
LRM
154
0
0
03 Oct 2025
Taming Text-to-Sounding Video Generation via Advanced Modality Condition and Interaction
Kaisi Guan
Xihua Wang
Zhengfeng Lai
Xin Cheng
Peng Zhang
Xiaojiang Liu
Ruihua Song
Meng Cao
DiffM
287
4
0
03 Oct 2025
Chain-of-Thought Reasoning in Streaming Full-Duplex End-to-End Spoken Dialogue Systems
Siddhant Arora
Jinchuan Tian
Hayato Futami
Jiatong Shi
Yosuke Kashiwagi
E. Tsunoo
Shinji Watanabe
LRM
110
2
0
02 Oct 2025
Stream RAG: Instant and Accurate Spoken Dialogue Systems with Streaming Tool Usage
Siddhant Arora
Haidar Khan
Kai Sun
Xin Luna Dong
Sajal Choudhary
...
Anuj Kumar
Ahmed Aly
Yue Liu
Florian Metze
Zhaojiang Lin
164
2
0
02 Oct 2025
From Scores to Preferences: Redefining MOS Benchmarking for Speech Quality Reward Modeling
Yifei Cao
Changhao Jiang
Jiabao Zhuang
Jiajun Sun
Ming Zhang
...
Yunke Zhang
Man Lan
Tao Gui
Qi Zhang
Xuanjing Huang
84
2
0
01 Oct 2025
Hearing the Order: Investigating Selection Bias in Large Audio-Language Models
Yu-Xiang Lin
Chen-An Li
Sheng-Lun Wei
Po-Chun Chen
Hsin-Hsi Chen
Hung-yi Lee
135
0
0
01 Oct 2025
When Silence Matters: The Impact of Irrelevant Audio on Text Reasoning in Large Audio-Language Models
Chen-An Li
Tzu-Han Lin
Hung-yi Lee
AuLLM
155
2
0
01 Oct 2025
Voice Evaluation of Reasoning Ability: Diagnosing the Modality-Induced Performance Gap
Yueqian Lin
Zhengmian Hu
Qinsi Wang
Yudong Liu
H. Zhang
Jayakumar Subramanian
N. Vlassis
Hai Helen Li
Yiran Chen
LRM
116
3
0
30 Sep 2025
Game-Time: Evaluating Temporal Dynamics in Spoken Language Models
Kai-Wei Chang
En-Pei Hu
Chun-Yi Kuan
Wenze Ren
Wei-Chih Chen
Guan-Ting Lin
Yu Tsao
Shao-Hua Sun
Hung-yi Lee
James R. Glass
AuLLM
214
7
0
30 Sep 2025
V-HUB: A Visual-Centric Humor Understanding Benchmark for Video LLMs
Zhengpeng Shi
Hengli Li
Yanpeng Zhao
Jianqun Zhou
Yuxuan Wang
Qinrong Cui
Wei Bi
Songchun Zhu
Bo Zhao
Zilong Zheng
VLM
122
0
0
30 Sep 2025
FreeRet: MLLMs as Training-Free Retrievers
Yuhan Zhu
Xiangyu Zeng
Chenting Wang
Xinhao Li
Yicheng Xu
Ziang Yan
Yi Wang
Limin Wang
OffRL
VLM
LRM
188
2
0
29 Sep 2025
Multimodal Large Language Models Meet Multimodal Emotion Recognition and Reasoning: A Survey
Yuntao Shou
Tao Meng
Wei Ai
Keqin Li
LRM
215
7
0
29 Sep 2025
Plug-and-Play Emotion Graphs for Compositional Prompting in Zero-Shot Speech Emotion Recognition
Jiacheng Shi
Hongfei Du
Y. Alicia Hong
Ye Gao
ReLM
LRM
60
0
0
29 Sep 2025
MGM-Omni: Scaling Omni LLMs to Personalized Long-Horizon Speech
Chengyao Wang
Zhisheng Zhong
Bohao Peng
Senqiao Yang
Yuqi Liu
Haokun Gui
Bin Xia
Jingyao Li
Bei Yu
Jiaya Jia
MLLM
AuLLM
VLM
172
2
0
29 Sep 2025
VoxCPM: Tokenizer-Free TTS for Context-Aware Speech Generation and True-to-Life Voice Cloning
Yixuan Zhou
Guoyang Zeng
Xin Liu
Xiang Li
Renjie Yu
...
Weiyue Sun
Jiancheng Gui
Kehan Li
Z. Wu
Zhiyuan Liu
143
5
0
29 Sep 2025
Compose and Fuse: Revisiting the Foundational Bottlenecks in Multimodal Reasoning
Yucheng Wang
Yifan Hou
Aydin Javadov
Mubashara Akhtar
Mrinmaya Sachan
LRM
142
0
0
28 Sep 2025
Understanding Textual Capability Degradation in Speech LLMs via Parameter Importance Analysis
Chao Wang
Rui Zheng
Yang Ai
Zhen-Hua Ling
93
0
0
28 Sep 2025
DiffuSpec: Unlocking Diffusion Language Models for Speculative Decoding
Guanghao Li
Zhihui Fu
Min Fang
Qibin Zhao
Ming Tang
Chun Yuan
Jun Wang
136
5
0
28 Sep 2025
Easy Turn: Integrating Acoustic and Linguistic Modalities for Robust Turn-Taking in Full-Duplex Spoken Dialogue Systems
Guojian Li
C. Wang
Hongfei Xue
Shuiyuan Wang
Dehui Gao
...
Yuke Lin
W. Li
Longshuai Xiao
Zhonghua Fu
Lei Xie
113
1
0
28 Sep 2025
XGC-AVis: Towards Audio-Visual Content Understanding with a Multi-Agent Collaborative System
Yuqin Cao
Xiongkuo Min
Yixuan Gao
Wei Sun
Zicheng Zhang
J. N. Han
Guangtao Zhai
94
2
0
27 Sep 2025
Think Smart, Not Hard: Difficulty Adaptive Reasoning for Large Audio Language Models
Zhichao Sheng
Shilin Zhou
Chen Gong
Zhenghua Li
AuLLM
LRM
315
0
0
26 Sep 2025
Previous
1
2
3
4
5
Next
Page 2 of 5