Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2509.17765
Cited By

Qwen3-Omni Technical Report

Qwen3-Omni Technical Report

22 September 2025

ArXiv (abs)PDF HTML HuggingFace (119 upvotes)Github (1014★)

Papers citing "Qwen3-Omni Technical Report"

20 / 20 papers shown

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

See, Hear, and Understand: Benchmarking Audiovisual Human Speech Understanding in Multimodal Large Language Models

Le Thien Phuc Nguyen

Samuel Low Yu Hang

...

Thanh-Huy Nguyen

199

1

0

01 Dec 2025

MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages

MCAT: Scaling Many-to-Many Speech-to-Text Translation with MLLMs to 70 Languages

104

0

0

01 Dec 2025

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert

AnyExperts: On-Demand Expert Allocation for Multimodal Language Models with Mixture of Expert

168

0

0

23 Nov 2025

VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment

VITAL: Vision-Encoder-centered Pre-training for LMMs in Visual Quality Assessment

184

0

0

22 Nov 2025

Step-Audio-R1 Technical Report

Step-Audio-R1 Technical Report

...

351

0

0

19 Nov 2025

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

OmniZip: Audio-Guided Dynamic Token Compression for Fast Omnimodal Large Language Models

255

2

0

18 Nov 2025

ArchMap: Arch-Flattening and Knowledge-Guided Vision Language Model for Tooth Counting and Structured Dental Understanding

ArchMap: Arch-Flattening and Knowledge-Guided Vision Language Model for Tooth Counting and Structured Dental Understanding

129

0

0

18 Nov 2025

LongCat-Flash-Omni Technical Report

LongCat-Flash-Omni Technical Report

...

589

4

0

31 Oct 2025

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

Ming-Flash-Omni: A Sparse, Unified Architecture for Multimodal Perception and Generation

...

350

2

0

28 Oct 2025

M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models

M3-SLU: Evaluating Speaker-Attributed Reasoning in Multimodal Large Language Models

217

0

0

22 Oct 2025

Data-Centric Lessons To Improve Speech-Language Pretraining

Data-Centric Lessons To Improve Speech-Language Pretraining

Vishaal Udandarao

Albin Madapally Jose

Chung-Cheng Chiu

140

0

0

22 Oct 2025

SegTune: Structured and Fine-Grained Control for Song Generation

SegTune: Structured and Fine-Grained Control for Song Generation

197

1

0

21 Oct 2025

MSRBench: A Benchmarking Dataset for Music Source Restoration

MSRBench: A Benchmarking Dataset for Music Source Restoration

Mark D. Plumbley

156

1

0

13 Oct 2025

A Survey on Agentic Multimodal Large Language Models

A Survey on Agentic Multimodal Large Language Models

...

LM&Ro AIFin AI4TS LRM AI4CE

250

5

0

13 Oct 2025

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

AVoCaDO: An Audiovisual Video Captioner Driven by Temporal Orchestration

...

255

2

0

12 Oct 2025

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

OmniVideoBench: Towards Audio-Visual Understanding Evaluation for Omni MLLMs

...

Zhaoxiang Zhang

155

8

0

12 Oct 2025

Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI

Aligning Perception, Reasoning, Modeling and Interaction: A Survey on Physical AI

Terry Jingchen Zhang

...

376

1

0

06 Oct 2025

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

BatonVoice: An Operationalist Framework for Enhancing Controllable Speech Synthesis with Linguistic Intelligence from LLMs

...

144

0

0

30 Sep 2025

AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook

AUV: Teaching Audio Universal Vector Quantization with Single Nested Codebook

162

2

0

26 Sep 2025

Prevailing Research Areas for Music AI in the Era of Foundation Models

Prevailing Research Areas for Music AI in the Era of Foundation Models

M. Modrzejewski

Aswin Sivaraman

Dorien Herremans

430

3

0

14 Sep 2024