ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2506.03569
96
0

MiMo-VL Technical Report

4 June 2025
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
Shuhuai Ren
S. Gu
Shicheng Li
Peidian Li
Liang Zhao
Lei Li
Kainan Bao
Hao Tian
Hailin Zhang
G. Wang
D. Zhu
Cici
Chenhong He
Bowen Ye
Bowen Shen
Zihan Zhang
Zihan Jiang
Zhixian Zheng
Zhichao Song
Zhenbo Luo
Yue Yu
Y. X. R. Wang
Yuanyuan Tian
Yu Tu
Y. Yan
Yi Huang
X. Wang
Xinzhe Xu
Xingchen Song
Xing Zhang
Xing Yong
Xin Zhang
X. Deng
Wenyu Yang
Wenhan Ma
Weiwei Lv
Weiji Zhuang
Wei Liu
Sirui Deng
Shuo Liu
Shimao Chen
S. Yu
Shaohui Liu
S. Wang
Rui Ma
Qiantong Wang
Peng Wang
Nuo Chen
Menghang Zhu
Kangyang Zhou
Kang Zhou
Kai Fang
Jun Shi
Jinhao Dong
Jiebao Xiao
Jiaming Xu
Huaqiu Liu
Hongshen Xu
Heng Qu
Haochen Zhao
Hanglong Lv
G. Wang
Duo Zhang
Dong Zhang
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
    OffRLMoEVLMLRM
ArXiv (abs)PDFHTML
Main:18 Pages
14 Figures
Bibliography:6 Pages
6 Tables
Appendix:8 Pages
Abstract

We open-source MiMo-VL-7B-SFT and MiMo-VL-7B-RL, two powerful vision-language models delivering state-of-the-art performance in both general visual understanding and multimodal reasoning. MiMo-VL-7B-RL outperforms Qwen2.5-VL-7B on 35 out of 40 evaluated tasks, and scores 59.4 on OlympiadBench, surpassing models with up to 78B parameters. For GUI grounding applications, it sets a new standard with 56.1 on OSWorld-G, even outperforming specialized models such as UI-TARS. Our training combines four-stage pre-training (2.4 trillion tokens) with Mixed On-policy Reinforcement Learning (MORL) integrating diverse reward signals. We identify the importance of incorporating high-quality reasoning data with long Chain-of-Thought into pre-training stages, and the benefits of mixed RL despite challenges in simultaneous multi-domain optimization. We also contribute a comprehensive evaluation suite covering 50+ tasks to promote reproducibility and advance the field. The model checkpoints and full evaluation suite are available atthis https URL.

View on arXiv
Comments on this paper