Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2503.20215
Cited By
Qwen2.5-Omni Technical Report
26 March 2025
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
S. Bai
Keqin Chen
Jialin Wang
Yang Fan
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (164 upvotes)
Papers citing
"Qwen2.5-Omni Technical Report"
50 / 239 papers shown
Title
ViSAudio: End-to-End Video-Driven Binaural Spatial Audio Generation
Mengchen Zhang
Qi Chen
Tong Wu
Zihan Liu
Dahua Lin
VGen
58
0
0
02 Dec 2025
OmniGuard: Unified Omni-Modal Guardrails with Deliberate Reasoning
Boyu Zhu
Xiaofei Wen
Wenjie Mo
Tinghui Zhu
Yanan Xie
Peng Qi
Muhao Chen
48
0
0
02 Dec 2025
Generative Multi-modal Feedback for Singing Voice Synthesis Evaluation
Xueyan Li
Y. Wang
Mengjie Jiang
Qingzi Zhu
Jiang Zhang
Zoey Kim
Yazhe Niu
EGVM
64
0
0
02 Dec 2025
YingVideo-MV: Music-Driven Multi-Stage Video Generation
Jiahui Chen
Weida Wang
Runhua Shi
Huan Yang
Chaofan Ding
Zihao Chen
DiffM
VGen
101
0
0
02 Dec 2025
MAViD: A Multimodal Framework for Audio-Visual Dialogue Understanding and Generation
Youxin Pang
Jiajun Liu
L. Tan
Yong Zhang
Feng Gao
Xiang Deng
Zhuoliang Kang
Xiaoming Wei
Y. Liu
VGen
64
0
0
02 Dec 2025
Script: Graph-Structured and Query-Conditioned Semantic Token Pruning for Multimodal Large Language Models
Zhongyu Yang
Dannong Xu
Wei Pang
Yingfang Yuan
VLM
88
0
0
01 Dec 2025
MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages
Yexing Du
Kaiyuan Liu
Youcheng Pan
B. Yang
Keqi Deng
Xie Chen
Yang Xiang
Ming Liu
Bin Qin
Y. Wang
LRM
36
0
0
01 Dec 2025
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Le Thien Phuc Nguyen
Zhuoran Yu
Samuel Low Yu Hang
Subin An
J. Lee
...
SeungEun Chung
Thanh-Huy Nguyen
JuWan Maeng
Soochahn Lee
Yong Jae Lee
AuLLM
VLM
161
0
0
01 Dec 2025
EvalTalker: Learning to Evaluate Real-Portrait-Driven Multi-Subject Talking Humans
Yingjie Zhou
Xilei Zhu
Siyu Ren
Ziyi Zhao
Z. Wang
...
Fengjiao Chen
Xiaoyu Li
Xuezhi Cao
Guangtao Zhai
Xiaohong Liu
EGVM
200
0
0
01 Dec 2025
MAC-SLU: Multi-Intent Automotive Cabin Spoken Language Understanding Benchmark
Yuezhang Peng
Chonghao Cai
Ziang Liu
Shuai Fan
Sheng Jiang
...
Kele Xu
Y. Li
S. Wang
L. Qin
Xie Chen
AuLLM
96
0
0
01 Dec 2025
RecruitView: A Multimodal Dataset for Predicting Personality and Interview Performance for Human Resources Applications
Amit Kumar Gupta
Farhan Sheth
Hammad Shaikh
Dheeraj Kumar
Angkul Puniya
Deepak Panwar
Sandeep Chaurasia
Priya Mathur
21
0
0
29 Nov 2025
Multi-Modal Scene Graph with Kolmogorov-Arnold Experts for Audio-Visual Question Answering
Z. Fu
Changsheng Lv
Mengshi Qi
Huadong Ma
100
0
0
28 Nov 2025
SingingSDS: A Singing-Capable Spoken Dialogue System for Conversational Roleplay Applications
Jionghao Han
Jiatong Shi
Masao Someki
Yuxun Tang
Lan Liu
Yiwen Zhao
Wenhao Feng
Shinji Watanabe
VLM
120
0
0
26 Nov 2025
Towards Audio Token Compression in Large Audio Language Models
Saurabhchand Bhati
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
233
0
0
26 Nov 2025
Unsupervised Memorability Modeling from Tip-of-the-Tongue Retrieval Queries
Sree Bhattacharyya
Yaman Kumar Singla
Sudhir Yarram
Somesh Singh
Harini S I
James Z. Wang
92
0
0
25 Nov 2025
A Reason-then-Describe Instruction Interpreter for Controllable Video Generation
Shengqiong Wu
Weicai Ye
Y. Zhang
Jiahao Wang
Quande Liu
Xintao Wang
Pengfei Wan
Kun Gai
Hao Fei
Tat-Seng Chua
VGen
LRM
160
0
0
25 Nov 2025
It Hears, It Sees too: Multi-Modal LLM for Depression Detection By Integrating Visual Understanding into Audio Language Models
Xiangyu Zhao
Yaling Shen
Yiwen Jiang
Z. Wang
Jiahe Liu
Maxmartwell H Cheng
Guilherme C Oliveira
Robert Desimone
Dominic Dwyer
Zongyuan Ge
106
1
0
25 Nov 2025
StereoDETR: Stereo-based Transformer for 3D Object Detection
Shiyi Mu
Zichong Gu
Zhiqi Ai
Anqi Liu
Yilin Gao
Shugong Xu
ViT
3DPC
132
0
0
24 Nov 2025
SkinGPT-R1: Adapter-Only Dual Distillation for Efficient Dermatology Reasoning
Yuhao Shen
Jiahe Qian
Zhangtianyi Chen
Yuanhao He
Juexiao Zhou
LRM
204
0
0
19 Nov 2025
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao
Kele Shao
Bohan Yu
Weiqiang Wang
Jian Liu
Huan Wang
VLM
229
0
0
18 Nov 2025
FxSearcher: gradient-free text-driven audio transformation
Hojoon Ki
Jongsuk Kim
Minchan Kwon
Junmo Kim
103
0
0
18 Nov 2025
Spatial Blind Spot: Auditory Motion Perception Deficits in Audio LLMs
Zhe Sun
Yujun Cai
Jiayu Yao
Yiwei Wang
AuLLM
LRM
332
0
0
17 Nov 2025
Uni-MoE-2.0-Omni: Scaling Language-Centric Omnimodal Large Model with Advanced MoE, Training and Data
Yunxin Li
Xinyu Chen
Shenyuan Jiang
Haoyuan Shi
Zhenyu Liu
...
Zhenran Xu
Yicheng Ma
Meishan Zhang
Baotian Hu
Min Zhang
MLLM
MoE
OSLM
VLM
543
1
0
16 Nov 2025
Learning to Hear by Seeing: It's Time for Vision Language Models to Understand Artistic Emotion from Sight and Sound
Dengming Zhang
W. You
Jingxiong Li
Weishen Lin
Wenda Shi
Xue Zhao
H. Zuo
Junxian Wu
Lingyun Sun
VLM
104
0
0
15 Nov 2025
Constructing and Interpreting Digital Twin Representations for Visual Reasoning via Reinforcement Learning
Yiqing Shen
Mathias Unberath
LRM
93
0
0
15 Nov 2025
EgoCogNav: Cognition-aware Human Egocentric Navigation
Zhiwen Qiu
Ziang Liu
Wenqian Niu
Tapomayukh Bhattacharjee
Saleh Kalantari
EgoV
188
0
0
15 Nov 2025
TopoPerception: A Shortcut-Free Evaluation of Global Visual Perception in Large Vision-Language Models
Wenhao Zhou
Hao Zheng
R. Zhao
MLLM
VLM
LRM
156
0
0
14 Nov 2025
AlignSurvey: A Comprehensive Benchmark for Human Preferences Alignment in Social Surveys
Chenxi Lin
Weikang Yuan
Zhuoren Jiang
Biao Huang
Ruitao Zhang
Jianan Ge
Yueqian Xu
Jianxing Yu
ALM
521
0
0
11 Nov 2025
MedVoiceBias: A Controlled Study of Audio LLM Behavior in Clinical Decision-Making
Zhi Rui Tam
Yun-Nung Chen
64
0
0
10 Nov 2025
Omni-AVSR: Towards Unified Multimodal Speech Recognition with Large Language Models
Umberto Cappellazzo
Xubo Liu
Pingchuan Ma
Stavros Petridis
Maja Pantic
AuLLM
283
0
0
10 Nov 2025
SAR-LM: Symbolic Audio Reasoning with Large Language Models
Termeh Taheri
Yinghao Ma
Emmanouil Benetos
AuLLM
LRM
154
0
0
09 Nov 2025
Long Grounded Thoughts: Distilling Compositional Visual Reasoning Chains at Scale
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
David Acuna
Chao-Han Huck Yang
Yuntian Deng
Jaehun Jung
Ximing Lu
Prithviraj Ammanabrolu
Hyunwoo J. Kim
Yuan-Hong Liao
Yejin Choi
ReLM
OffRL
LRM
311
1
0
07 Nov 2025
MIDI-LLM: Adapting Large Language Models for Text-to-MIDI Music Generation
Shih-Lun Wu
Yoon Kim
Cheng-Zhi Anna Huang
242
0
0
06 Nov 2025
Feedback-driven Retrieval-augmented Audio Generation with Large Audio Language Models
Junqi Zhao
Chenxing Li
Jinzheng Zhao
Rilin Chen
Dong Yu
Mark D. Plumbley
Wenwu Wang
90
0
0
02 Nov 2025
MULTI-Bench: A Multi-Turn Interactive Benchmark for Assessing Emotional Intelligence ability of Spoken Dialogue Models
Y. Deng
Guoqiang Hu
Haiyang Sun
X. Zhang
H. Zhang
Fei Tian
Xuerui Yang
Gang Yu
Eng Siong Chng
76
0
0
02 Nov 2025
ALMGuard: Safety Shortcuts and Where to Find Them as Guardrails for Audio-Language Models
Weifei Jin
Yuxin Cao
Junjie Su
Minhui Xue
Jie Hao
Ke Xu
Jin Song Dong
Derui Wang
AAML
102
0
0
30 Oct 2025
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Z. He
MLLM
MoE
VLM
286
2
0
28 Oct 2025
TeleEgo: Benchmarking Egocentric AI Assistants in the Wild
Jiaqi Yan
Ruilong Ren
J. Liu
Shuning Xu
Ling Wang
...
Dell Zhang
Hao Sun
Chi Zhang
Xuelong Li
Xuelong Li
266
0
0
28 Oct 2025
STAR-Bench: Probing Deep Spatio-Temporal Reasoning as Audio 4D Intelligence
Zihan Liu
Zhikang Niu
Qiuyang Xiao
Zhisheng Zheng
Ruoqi Yuan
...
Jianze Liang
Xie Chen
Leilei Sun
Dahua Lin
Jiaqi Wang
AuLLM
LRM
431
2
0
28 Oct 2025
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
Shijian Wang
Jiarui Jin
Xingjian Wang
L. Song
Runhao Fu
H. Wang
Zongyuan Ge
Yuan Lu
Xuelian Cheng
ReLM
LRM
104
5
0
27 Oct 2025
RoboOmni: Proactive Robot Manipulation in Omni-modal Context
Siyin Wang
Jinlan Fu
Feihong Liu
Xinzhe He
Huangxuan Wu
...
Z. F. Wu
Yugang Jiang
See-Kiong Ng
Tat-Seng Chua
Xipeng Qiu
LM&Ro
230
1
0
27 Oct 2025
Omni-Reward: Towards Generalist Omni-Modal Reward Modeling with Free-Form Preferences
Zhuoran Jin
Hongbang Yuan
Kejian Zhu
Jiachun Li
Pengfei Cao
Yubo Chen
Kang Liu
Jun Zhao
113
0
0
27 Oct 2025
ISA-Bench: Benchmarking Instruction Sensitivity for Large Audio Language Models
Bohan Li
Wenbin Huang
Yuhang Qiu
Yiwei Guo
Hankun Wang
Zhihan Li
Jing Peng
Ziyang Ma
Xie Chen
Kai Yu
AuLLM
201
0
0
27 Oct 2025
UltraVoice: Scaling Fine-Grained Style-Controlled Speech Conversations for Spoken Dialogue Models
Wenming Tu
Guanrou Yang
Ruiqi Yan
Wenxi Chen
Ziyang Ma
Yipeng Kang
Kai Yu
Xie Chen
Zilong Zheng
104
0
0
26 Oct 2025
Ming-UniAudio: Speech LLM for Joint Understanding, Generation and Editing with Unified Representation
C. Yan
Chunxiang Jin
Dawei Huang
Haibing Yu
Han Peng
...
Yongjie Lyu
Z. He
Zhihao Qiu
Zhiqiang Fang
Ziyuan Huang
AuLLM
329
3
0
26 Oct 2025
DeepfakeBench-MM: A Comprehensive Benchmark for Multimodal Deepfake Detection
Kangran Zhao
Yupeng Chen
Xiaoyu Zhang
Yize Chen
Weinan Guan
...
Chengzhe Sun
Soumyya Kanti Datta
Qingshan Liu
Siwei Lyu
Baoyuan Wu
92
1
0
26 Oct 2025
Evaluating Multimodal Large Language Models on Core Music Perception Tasks
Brandon James Carone
Iran R. Roman
Pablo Ripollés
LRM
129
1
0
25 Oct 2025
Incentivizing Consistent, Effective and Scalable Reasoning Capability in Audio LLMs via Reasoning Process Rewards
Jiajun Fan
Roger Ren
Jingyuan Li
R. Pandey
Prashanth Gurunath Shivakumar
I. Bulyko
Ankur Gandhe
Ge Liu
Yile Gu
LRM
98
1
0
23 Oct 2025
Decoding the Ear: A Framework for Objectifying Expressiveness from Human Preference Through Efficient Alignment
Zhiyu Lin
Jingwen Yang
Jiale Zhao
Meng Liu
Sunzhu Li
Benyou Wang
88
0
0
23 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
124
0
0
22 Oct 2025
1
2
3
4
5
Next