ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.20215
  4. Cited By
Qwen2.5-Omni Technical Report

Qwen2.5-Omni Technical Report

26 March 2025
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
S. Bai
Keqin Chen
Jialin Wang
Yang Fan
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
    VGenAuLLM
ArXiv (abs)PDFHTMLHuggingFace (164 upvotes)

Papers citing "Qwen2.5-Omni Technical Report"

50 / 242 papers shown
Kwai Keye-VL 1.5 Technical Report
Kwai Keye-VL 1.5 Technical Report
Biao Yang
Bin Wen
Boyang Ding
Changyi Liu
Chenglong Chu
...
S. Wang
X. Luo
Yan Li
Yuhang Hu
Zixing Zhang
VLM
325
15
0
01 Sep 2025
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
WoW-Bench: Evaluating Fine-Grained Acoustic Perception in Audio-Language Models via Marine Mammal Vocalizations
J. Kim
Heeseung Yun
Sang Hoon Woo
Chao-Han Huck Yang
Gunhee Kim
AuLLM
114
0
0
28 Aug 2025
ChipChat: Low-Latency Cascaded Conversational Agent in MLX
ChipChat: Low-Latency Cascaded Conversational Agent in MLX
Tatiana Likhomanenko
Luke Carlson
Richard He Bai
Zijin Gu
Han Tran
Zakaria Aldeneh
Yizhe Zhang
Ruixiang Zhang
Huangjie Zheng
Navdeep Jaitly
105
1
0
26 Aug 2025
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
SEAM: Semantically Equivalent Across Modalities Benchmark for Vision-Language Models
Zhenwei Tang
Difan Jiao
Blair Yang
Ashton Anderson
VLMCoGe
142
1
0
25 Aug 2025
Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies
Speech-Based Cognitive Screening: A Systematic Evaluation of LLM Adaptation Strategies
Fatemeh Taherinezhad
Mohamad Javad Momeni Nezhad
Sepehr Karimi
Sina Rashidi
Ali Zolnour
Maryam Dadkhah
Yasaman Haghbin
Hossein Azadmaleki
Maryam Zolnoori
90
1
0
24 Aug 2025
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
TaDiCodec: Text-aware Diffusion Speech Tokenizer for Speech Language Modeling
Yuancheng Wang
Dekun Chen
Xueyao Zhang
Junan Zhang
Jiaqi Li
Zhizheng Wu
228
4
0
22 Aug 2025
Mini-Omni-Reasoner: Token-Level Thinking-in-Speaking in Large Speech Models
Mini-Omni-Reasoner: Token-Level Thinking-in-Speaking in Large Speech Models
Zhifei Xie
Ziyang Ma
Zihang Liu
Kaiyu Pang
Hongyu Li
J. Zhang
Yue Liao
Deheng Ye
Chunyan Miao
Shuicheng Yan
AuLLMLRM
264
7
0
18 Aug 2025
RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts
RadarQA: Multi-modal Quality Analysis of Weather Radar Forecasts
Xuming He
Zhiyuan You
Junchao Gong
Couhua Liu
Xiaoyu Yue
Peiqin Zhuang
Wenlong Zhang
Wenlong Zhang
92
3
0
17 Aug 2025
Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding
Audio Flamingo Sound-CoT Technical Report: Improving Chain-of-Thought Reasoning in Sound Understanding
Zhifeng Kong
Arushi Goel
J. F. Santos
Sreyan Ghosh
Rafael Valle
Wei Ping
Bryan Catanzaro
ReLMAuLLMLRM
178
2
0
15 Aug 2025
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Seeing, Listening, Remembering, and Reasoning: A Multimodal Agent with Long-Term Memory
Lin Long
Yexiao He
Wentao Ye
Yiyuan Pan
Yuan Lin
Hang Li
Junbo Zhao
Wei Li
346
8
0
13 Aug 2025
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
MME-Emotion: A Holistic Evaluation Benchmark for Emotional Intelligence in Multimodal Large Language Models
Fan Zhang
Minghan Li
Chong Deng
Xue Yang
Zheng Lian
...
Xian Wu
Kun Wang
Xiangang Li
Jieping Ye
Pheng-Ann Heng
AI4MH
153
3
0
11 Aug 2025
Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
Omni-SafetyBench: A Benchmark for Safety Evaluation of Audio-Visual Large Language Models
Leyi Pan
Zheyu Fu
Yunpeng Zhai
Shuchang Tao
Sheng Guan
...
Zhaoyang Liu
Bolin Ding
Felix Henry
Lijie Wen
Aiwei Liu
MLLMELM
197
1
0
10 Aug 2025
LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic data
LLMCARE: early detection of cognitive impairment via transformer models enhanced by LLM-generated synthetic dataFrontiers in Artificial Intelligence (Front. Artif. Intell.), 2025
Ali Zolnour
Hossein Azadmaleki
Yasaman Haghbin
Fatemeh Taherinezhad
Mohamad Javad Momeni Nezhad
...
Suzanne Bakken
Yadollah Yaghoobzadeh
Abdol-Hossein Vahabie
Masoud Rouhizadeh
Maryam Zolnoori
LM&MA
143
0
0
08 Aug 2025
Training-Free Multimodal Large Language Model Orchestration
Training-Free Multimodal Large Language Model Orchestration
Tianyu Xie
Yuhang Wu
Yongdong Luo
Jinfa Huang
Xiawu Zheng
137
0
0
06 Aug 2025
OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing
OmniPlay: Benchmarking Omni-Modal Models on Omni-Modal Game Playing
Fuqing Bie
Shiyu Huang
Xijia Tao
Zhiqin Fang
Leyi Pan
Junzhe Chen
Min Ren
Liuyu Xiang
Zhaofeng He
189
0
0
06 Aug 2025
RealTalk-CN: A Realistic Chinese Speech-Text Dialogue Benchmark With Cross-Modal Interaction Analysis
RealTalk-CN: A Realistic Chinese Speech-Text Dialogue Benchmark With Cross-Modal Interaction Analysis
Enzhi Wang
Qicheng Li
Shiwan Zhao
Aobo Kong
Jiaming Zhou
X. Yang
Yequan Wang
Yonghua Lin
Yong Qin
71
0
0
06 Aug 2025
ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
ESDD 2026: Environmental Sound Deepfake Detection Challenge Evaluation Plan
Han Yin
Yang Xiao
Rohan Kumar Das
Jisheng Bai
Ting Dang
119
5
0
06 Aug 2025
MiDashengLM: Efficient Audio Understanding with General Audio Captions
MiDashengLM: Efficient Audio Understanding with General Audio Captions
Heinrich Dinkel
Gang Li
Jizhong Liu
Jian Luan
Yadong Niu
Xingwei Sun
Tianzi Wang
Qiyang Xiao
Junbo Zhang
Jiahao Zhou
AuLLMAI4TSVLM
422
13
0
06 Aug 2025
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
AVATAR: Reinforcement Learning to See, Hear, and Reason Over Video
Yogesh Kulkarni
Pooyan Fazli
OffRLLRM
280
4
0
05 Aug 2025
SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents
SpeechRole: A Large-Scale Dataset and Benchmark for Evaluating Speech Role-Playing Agents
C. Jiang
Jiajun Sun
Yifei Cao
Jiabao Zhuang
Hui Li
Xiaoran Fan
Ming-bo Wen
Junjie Ye
Jiajun Sun
299
0
0
04 Aug 2025
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
VeOmni: Scaling Any Modality Model Training with Model-Centric Distributed Recipe Zoo
Qianli Ma
Yaowei Zheng
Zhelun Shi
Zhongkai Zhao
Bin Jia
...
Y. Li
Jiacheng Yang
Yanghua Peng
Zhi-Li Zhang
Xin Liu
MoEVLM
349
3
0
04 Aug 2025
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Multimodal Large Language Models for End-to-End Affective Computing: Benchmarking and Boosting with Generative Knowledge Prompting
Miaosen Luo
Jiesen Long
Zequn Li
Yunying Yang
Yuncheng Jiang
Sijie Mai
198
1
0
04 Aug 2025
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
From Contrast to Commonality: Audio Commonality Captioning for Enhanced Audio-Text Cross-modal Understanding in Multimodal LLMs
Yuhang Jia
Xu Zhang
Yong Qin
Yang Chen
Shiwan Zhao
VLM
203
0
0
03 Aug 2025
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings
Multi-Agent Game Generation and Evaluation via Audio-Visual Recordings
Alexia Jolicoeur-Martineau
VGen
116
0
0
01 Aug 2025
AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation
AudioGen-Omni: A Unified Multimodal Diffusion Transformer for Video-Synchronized Audio, Speech, and Song Generation
L. Wang
Jun Wang
Feng Deng
Feng Deng
Chen Zhang
Di Zhang
Kun Gai
DiffMVGen
746
7
0
01 Aug 2025
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
ARC-Hunyuan-Video-7B: Structured Video Comprehension of Real-World Shorts
Yuying Ge
Yixiao Ge
Chen Li
Teng Wang
Junfu Pu
...
Xiaojing Zhang
Yangyu Tao
Han Hu
Di Wang
Mingyu Ding
151
13
0
28 Jul 2025
JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1
JWB-DH-V1: Benchmark for Joint Whole-Body Talking Avatar and Speech Generation Version 1
Xinhan Di
Kristin Qi
Pengqian Yu
DiffMVGen
214
0
0
28 Jul 2025
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
When Tokens Talk Too Much: A Survey of Multimodal Long-Context Token Compression across Images, Videos, and Audios
Kele Shao
Keda Tao
Kejia Zhang
Sicheng Feng
Mu Cai
Yuzhang Shang
Haoxuan You
Can Qin
Yang Sui
Huan Wang
508
11
0
27 Jul 2025
Predicting Brain Responses To Natural Movies With Multimodal LLMs
Predicting Brain Responses To Natural Movies With Multimodal LLMs
Cesar Kadir Torrico Villanueva
Jiaxin Cindy Tu
Mihir Tripathy
Connor Lane
Rishab Iyer
Paul S. Scotti
128
3
0
26 Jul 2025
DIFFA: Large Language Diffusion Models Can Listen and Understand
DIFFA: Large Language Diffusion Models Can Listen and Understand
Jiaming Zhou
Hongjie Chen
Shiwan Zhao
Jian Kang
Jie Li
...
Haoqin Sun
Hui Wang
Aobo Kong
Yong Qin
X. Li
208
3
0
24 Jul 2025
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
GOAT-SLM: A Spoken Language Model with Paralinguistic and Speaker Characteristic Awareness
Hongjie Chen
Zehan Li
Yaodong Song
Wenming Deng
Yitong Yao
...
Chao Wang
Shuangyong Song
Yongxiang Li
Zhongjiang He
Xuelong Li
AuLLMVLM
255
3
0
24 Jul 2025
VIBE: Video-Input Brain Encoder for fMRI Response Modeling
VIBE: Video-Input Brain Encoder for fMRI Response Modeling
Daniel Carlstrom Schad
Shrey Dixit
Janis Keck
Viktor Studenyak
Aleksandr Shpilevoi
Andrej Bicanski
240
2
0
23 Jul 2025
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
STITCH: Simultaneous Thinking and Talking with Chunked Reasoning for Spoken Language Models
Cheng-Han Chiang
Xiaofei Wang
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
S. Liu
Zhendong Wang
Zhengyuan Yang
Hung-yi Lee
Lijuan Wang
ReLMLRM
140
10
0
21 Jul 2025
Pixels, Patterns, but No Poetry: To See The World like Humans
Pixels, Patterns, but No Poetry: To See The World like Humans
Hongcheng Gao
Longxiang Zhang
Lin Xu
Jingyi Tang
X. Li
...
Xinlong Yang
Ge Wu
Balong Bi
Hongyu Chen
Wentao Zhang
MLLMLRMVLM
158
3
0
21 Jul 2025
BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM
BusterX++: Towards Unified Cross-Modal AI-Generated Content Detection and Explanation with MLLM
Haiquan Wen
Tianxiao Li
Zhenglin Huang
Yiwei He
Guangliang Cheng
301
2
0
19 Jul 2025
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
AnyCap Project: A Unified Framework, Dataset, and Benchmark for Controllable Omni-modal Captioning
Yiming Ren
Zhiqiang Lin
Yu Li
Gao Meng
Weiyun Wang
...
Zicheng Lin
Jifeng Dai
Yujiu Yang
Wenhai Wang
Ruihang Chu
176
3
0
17 Jul 2025
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks
UGC-VideoCaptioner: An Omni UGC Video Detail Caption Model and New Benchmarks
Peiran Wu
Yunze Liu
Zhengdong Zhu
Enmin Zhou
Junxiao Shen
209
2
0
15 Jul 2025
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
DeepOmni: Towards Seamless and Smart Speech Interaction with Adaptive Modality-Specific MoE
Hang Shao
Heting Gao
Yunhang Shen
Jiawei Chen
Zuwei Long
Dong Yang
Ke Li
Xing Sun
AuLLMMoE
218
2
0
27 Jun 2025
WildSpeech-Bench: Benchmarking End-to-End SpeechLLMs in the Wild
WildSpeech-Bench: Benchmarking End-to-End SpeechLLMs in the Wild
Jian Zhang
Linhao Zhang
Bokai Lei
Chuhan Wu
Aiwei Liu
Wei Jia
Xiao-bin Zhou
AuLLMLM&MA
243
2
0
27 Jun 2025
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
RePIC: Reinforced Post-Training for Personalizing Multi-Modal Language Models
Yeongtak Oh
J. Mok
Juhyeon Shin
Juhyeon Shin
Sangha Park
J. Mok
Sungroh Yoon
VLM
388
1
0
23 Jun 2025
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
video-SALMONN 2: Caption-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
377
2
0
18 Jun 2025
AviationLLM: An LLM-based Knowledge System for Aviation Training
AviationLLM: An LLM-based Knowledge System for Aviation Training
Jiaáng Wan
Feng Shen
Fujuan Li
Yanjin Sun
Yan Li
Shiwen Zhang
204
1
0
17 Jun 2025
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
SoundMind: RL-Incentivized Logic Reasoning for Audio-Language Models
Xingjian Diao
Chunhui Zhang
Keyi Kong
Weiyi Wu
Chiyu Ma
Z. Ouyang
Peijun Qing
Soroush Vosoughi
Jiang Gui
AuLLMOffRLReLMLRM
211
8
0
15 Jun 2025
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
NoLoCo: No-all-reduce Low Communication Training Method for Large Models
Jari Kolehmainen
Nikolay Blagoev
John Donaghy
Oğuzhan Ersoy
Christopher Nies
278
0
0
12 Jun 2025
VAT-KG: Knowledge-Intensive Multimodal Knowledge Graph Dataset for Retrieval-Augmented Generation
VAT-KG: Knowledge-Intensive Multimodal Knowledge Graph Dataset for Retrieval-Augmented Generation
Hyeongcheol Park
Jiyoung Seo
MinHyuk Jang
Hogun Park
Ha Dam Baek
Gyusam Chang
Hyeonsoo Im
Sangpil Kim
305
2
0
11 Jun 2025
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Step-Audio-AQAA: a Fully End-to-End Expressive Large Audio Language Model
Ailin Huang
B. Li
Bruce Wang
Boyong Wu
Chao Yan
...
X. Zhang
Yibo Zhu
Daxin Jiang
Shuchang Zhou
Chen-Hao Hu
AuLLM
345
7
0
10 Jun 2025
UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions
UITron-Speech: Towards Automated GUI Agents Based on Speech Instructions
Wenkang Han
Zhixiong Zeng
Jing Huang
Shu Jiang
Liming Zheng
Longrong Yang
Haibo Qiu
Chang Yao
Jingyuan Chen
Lin Ma
LM&Ro
266
2
0
10 Jun 2025
DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
DEBATE: A Dataset for Disentangling Textual Ambiguity in Mandarin Through Speech
Haotian Guo
Jing Han
Yongfeng Tu
Shihao Gao
Shengfan Shen
Wulong Xiang
Weihao Gan
Zixing Zhang
137
0
0
09 Jun 2025
Movie Facts and Fibs (MF$^2$): A Benchmark for Long Movie Understanding
Movie Facts and Fibs (MF2^22): A Benchmark for Long Movie Understanding
Emmanouil Zaranis
António Farinhas
Saul Santos
Beatriz Canaverde
Miguel Moura Ramos
...
Raffaella Bernardi
Raquel Fernández
Sandro Pezzelle
Vlad Niculae
Andre F. T. Martins
231
3
0
06 Jun 2025
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
Lidong Lu
Guo Chen
Ruoyao Xiao
Yicheng Liu
Tong Lu
VLMLRM
339
7
0
05 Jun 2025
Previous
12345
Next