ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2509.17765
  4. Cited By
Qwen3-Omni Technical Report

Qwen3-Omni Technical Report

22 September 2025
Jin Xu
Zhifang Guo
Hangrui Hu
Yunfei Chu
Xiong Wang
Jinzheng He
Yuxuan Wang
Xian Shi
Ting He
Xinfa Zhu
Yuanjun Lv
Y. Wang
D. Guo
He Wang
Linhan Ma
Pei Zhang
Xinyu Zhang
Hongkun Hao
Zishan Guo
Baosong Yang
Bin Zhang
Ziyang Ma
X. Wei
S. Bai
Keqin Chen
Xuejing Liu
Liang Luo
Mingkun Yang
Dayiheng Liu
Xingzhang Ren
Bo Zheng
Rui Men
Fan Zhou
Bowen Yu
Jianxin Yang
Le Yu
Jingren Zhou
Junyang Lin
    AuLLMVGenVLM
ArXiv (abs)PDFHTMLHuggingFace (119 upvotes)Github (1014★)

Papers citing "Qwen3-Omni Technical Report"

20 / 20 papers shown
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models
Le Thien Phuc Nguyen
Zhuoran Yu
Samuel Low Yu Hang
Subin An
J. Lee
...
SeungEun Chung
Thanh-Huy Nguyen
JuWan Maeng
Soochahn Lee
Yong Jae Lee
AuLLMVLM
199
1
0
01 Dec 2025
MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages
MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages
Yexing Du
Kaiyuan Liu
Youcheng Pan
B. Yang
Keqi Deng
Xie Chen
Yang Xiang
Ming Liu
Bin Qin
Y. Wang
LRM
104
0
0
01 Dec 2025
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert
Yuting Gao
Wang Lan
Hengyuan Zhao
Linjiang Huang
Si Liu
Q. Guo
MoE
168
0
0
23 Nov 2025
VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment
VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment
Ziheng Jia
Linhan Cao
J. N. Han
Zicheng Zhang
Jiaying Qian
Jiarui Wang
Z. Chen
Guangtao Zhai
Xiongkuo Min
MLLM
184
0
0
22 Nov 2025
Step-Audio-R1 Technical Report
Step-Audio-R1 Technical Report
Fei Tian
Xiangyu Zhang
Y. Zhang
Haoyang Zhang
Yuxin Li
...
Eng Siong Chng
Xuerui Yang
Xiangyu Zhang
Daxin Jiang
Gang Yu
AuLLMLRM
351
0
0
19 Nov 2025
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models
Keda Tao
Kele Shao
Bohan Yu
Weiqiang Wang
Jian Liu
Huan Wang
VLM
255
2
0
18 Nov 2025
ArchMap: Arch-Flattening and Knowledge-Guided Vision Language Model for Tooth Counting and Structured Dental Understanding
ArchMap: Arch-Flattening and Knowledge-Guided Vision Language Model for Tooth Counting and Structured Dental Understanding
Bohan Zhang
Yiyi Miao
Taoyu Wu
Tong Chen
Ji Jiang
Zhuoxiao Li
Zhe Tang
Limin Yu
Jionglong Su
129
0
0
18 Nov 2025
LongCat-Flash-Omni Technical Report
LongCat-Flash-Omni Technical Report
M-A-P Team
Bairui Wang
Bayan
Bin Xiao
Bo Zhang
...
Xin Pan
Xin Chen
Xiusong Sun
Xu Xiang
X. Xing
MLLMVLM
589
4
0
31 Oct 2025
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation
Inclusion AI
Bowen Ma
Cheng Zou
C. Yan
Chunxiang Jin
...
Zhiqiang Fang
Zhihao Qiu
Ziyuan Huang
Zizheng Yang
Z. He
MLLMMoEVLM
350
2
0
28 Oct 2025
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models
Yejin Kwon
Taewoo Kang
Hyunsoo Yoon
Changouk Kim
AuLLMELMLRM
217
0
0
22 Oct 2025
Data-Centric Lessons To Improve Speech-Language Pretraining
Data-Centric Lessons To Improve Speech-Language Pretraining
Vishaal Udandarao
Zhiyun Lu
Xuankai Chang
Yongqiang Wang
Violet Z. Yao
Albin Madapally Jose
Fartash Faghri
Josh Gardner
Chung-Cheng Chiu
140
0
0
22 Oct 2025
SegTune: Structured and Fine-Grained Control for Song Generation
SegTune: Structured and Fine-Grained Control for Song Generation
Pengfei Cai
Joanna Wang
Haorui Zheng
X. Li
Zihao Ji
Teng Ma
Zhongliang Liu
Chen Zhang
Pengfei Wan
197
1
0
21 Oct 2025
MSRBench: A Benchmarking Dataset for Music Source Restoration
MSRBench: A Benchmarking Dataset for Music Source Restoration
Yongyi Zang
Jiarui Hai
Wanying Ge
Qiuqiang Kong
Zheqi Dai
Helin Wang
Yuki Mitsufuji
Mark D. Plumbley
156
1
0
13 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
250
5
0
13 Oct 2025
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration
Xinlong Chen
Yue Ding
Weihong Lin
Jingyun Hua
Linli Yao
...
Yuanxing Zhang
Qiang Liu
Pengfei Wan
Liang Wang
Tieniu Tan
255
2
0
12 Oct 2025
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs
Caorui Li
Yu Chen
Yiyan Ji
Jin Xu
Zhenyu Cui
...
Zili Wang
Minghao Liu
Junran Peng
Zhaoxiang Zhang
Jiaheng Liu
AuLLMLRM
155
8
0
12 Oct 2025
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI
Kun Xiang
Terry Jingchen Zhang
Yinya Huang
Jixi He
Zirong Liu
...
J. N. Han
Hang Xu
Han Li
Bin Dong
Xiaodan Liang
PINNAI4CE
376
1
0
06 Oct 2025
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs
BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs
Yue Wang
Ruotian Ma
Xingyu Chen
Zhengliang Shi
Wanshun Chen
...
Juntao Li
Min Zhang
Zhaopeng Tu
Xiaolong Li
Linus
144
0
0
30 Sep 2025
AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook
AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook
Yihao Chen
Kai Hu
Long Zhou
Shulin Feng
Xusheng Yang
Hangting Chen
Xie Chen
162
2
0
26 Sep 2025
Prevailing Research Areas for Music AI in the Era of Foundation Models
Prevailing Research Areas for Music AI in the Era of Foundation Models
Megan Wei
M. Modrzejewski
Aswin Sivaraman
Dorien Herremans
MedIm
430
3
0
14 Sep 2024
1