ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.07608
27
0

MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining

12 May 2025
Xiaomi LLM-Core Team
Bingquan Xia
B. S.
Cici
Dawei Zhu
Di Zhang
G. Wang
H. Zhang
Huaqiu Liu
Jiebao Xiao
Jinhao Dong
Liang Zhao
Peidian Li
P. Wang
S. Yu
Shimao Chen
W. Wang
Wenhan Ma
X. Deng
Y. Huang
Yifan Song
Z. L. Jiang
Bowen Ye
Can Cai
C. He
Dong Zhang
Duo Zhang
Guoan Wang
Hao Tian
Haochen Zhao
Heng Qu
Hongshen Xu
Jun Shi
Kainan Bao
Qingkai Fang
Kang Zhou
Kangyang Zhou
Lei Li
Menghang Zhu
Nuo Chen
Q. Wang
Shaohui Liu
Shicheng Li
S. Gu
Shuhuai Ren
S. Liu
Sirui Deng
Weiji Zhuang
Weiwei Lv
Wenyu Yang
Xin Zhang
Xing Yong
Xing Zhang
Xingchen Song
Xinzhe Xu
X. U. Wang
Yihan Yan
Yu Tu
Yuanyuan Tian
Y. Wang
Yue Yu
Zhenru Lin
Zhichao Song
Zihao Yue
    MoE
    ReLM
    LRM
    AI4CE
ArXivPDFHTML
Abstract

We present MiMo-7B, a large language model born for reasoning tasks, with optimization across both pre-training and post-training stages. During pre-training, we enhance the data preprocessing pipeline and employ a three-stage data mixing strategy to strengthen the base model's reasoning potential. MiMo-7B-Base is pre-trained on 25 trillion tokens, with additional Multi-Token Prediction objective for enhanced performance and accelerated inference speed. During post-training, we curate a dataset of 130K verifiable mathematics and programming problems for reinforcement learning, integrating a test-difficulty-driven code-reward scheme to alleviate sparse-reward issues and employing strategic data resampling to stabilize training. Extensive evaluations show that MiMo-7B-Base possesses exceptional reasoning potential, outperforming even much larger 32B models. The final RL-tuned model, MiMo-7B-RL, achieves superior performance on mathematics, code and general reasoning tasks, surpassing the performance of OpenAI o1-mini. The model checkpoints are available atthis https URL.

View on arXiv
@article{team2025_2505.07608,
  title={ MiMo: Unlocking the Reasoning Potential of Language Model -- From Pretraining to Posttraining },
  author={ Xiaomi LLM-Core Team and Bingquan Xia and Bowen Shen and Cici and Dawei Zhu and Di Zhang and Gang Wang and Hailin Zhang and Huaqiu Liu and Jiebao Xiao and Jinhao Dong and Liang Zhao and Peidian Li and Peng Wang and Shihua Yu and Shimao Chen and Weikun Wang and Wenhan Ma and Xiangwei Deng and Yi Huang and Yifan Song and Zihan Jiang and Bowen Ye and Can Cai and Chenhong He and Dong Zhang and Duo Zhang and Guoan Wang and Hao Tian and Haochen Zhao and Heng Qu and Hongshen Xu and Jun Shi and Kainan Bao and QingKai Fang and Kang Zhou and Kangyang Zhou and Lei Li and Menghang Zhu and Nuo Chen and Qiantong Wang and Shaohui Liu and Shicheng Li and Shuhao Gu and Shuhuai Ren and Shuo Liu and Sirui Deng and Weiji Zhuang and Weiwei Lv and Wenyu Yang and Xin Zhang and Xing Yong and Xing Zhang and Xingchen Song and Xinzhe Xu and Xu Wang and Yihan Yan and Yu Tu and Yuanyuan Tian and Yudong Wang and Yue Yu and Zhenru Lin and Zhichao Song and Zihao Yue },
  journal={arXiv preprint arXiv:2505.07608},
  year={ 2025 }
}
Comments on this paper