ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12599
78
128

Kimi k1.5: Scaling Reinforcement Learning with LLMs

22 January 2025
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
Cheng Chen
Cheng Li
Chenjun Xiao
C. Du
Chonghua Liao
C. Tang
C. Wang
Dehao Zhang
Enming Yuan
Enzhe Lu
Fengxiang Tang
Flood Sung
Guangda Wei
Guokun Lai
Haiqing Guo
Han Zhu
Hao Ding
Hao Hu
Hao Yang
Hao Zhang
Haotian Yao
Haotian Zhao
Haoyu Lu
H. Li
Haozhen Yu
Hongcheng Gao
Huabin Zheng
Huan Yuan
Jia-Yu Chen
Jianhang Guo
Jianlin Su
J. Wang
J. Zhao
Jin Zhang
J. Liu
Junjie Yan
J. Wu
Lidong Shi
Ling Ye
L. Yu
Mengnan Dong
N. Zhang
Ningchen Ma
Qiwei Pan
Qucheng Gong
S. Liu
Shengling Ma
Shupeng Wei
Sihan Cao
S. Huang
Tao Jiang
W. Gao
Weimin Xiong
Weiran He
W. R. Huang
W. Wu
Wenyang He
Xianghui Wei
Xianqing Jia
Xingzhe Wu
Xinran Xu
Xinxing Zu
Xinyu Zhou
Xuehai Pan
Y. Charles
Yang Li
Y. Hu
Y. Liu
Y. Chen
Yejie Wang
Yibo Liu
Yidao Qin
Y. Liu
Y. Yang
Yiping Bao
Yulun Du
Yuxin Wu
Yuzhi Wang
Zaida Zhou
Z. Wang
Z. Li
Zhen Zhu
Zheng Zhang
Zhexu Wang
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
    VLM
    ALM
    OffRL
    AI4TS
    LRM
ArXivPDFHTML
Abstract

Language model pretraining with next token prediction has proved effective for scaling compute but is limited to the amount of available training data. Scaling reinforcement learning (RL) unlocks a new axis for the continued improvement of artificial intelligence, with the promise that large language models (LLMs) can scale their training data by learning to explore with rewards. However, prior published work has not produced competitive results. In light of this, we report on the training practice of Kimi k1.5, our latest multi-modal LLM trained with RL, including its RL training techniques, multi-modal data recipes, and infrastructure optimization. Long context scaling and improved policy optimization methods are key ingredients of our approach, which establishes a simplistic, effective RL framework without relying on more complex techniques such as Monte Carlo tree search, value functions, and process reward models. Notably, our system achieves state-of-the-art reasoning performance across multiple benchmarks and modalities -- e.g., 77.5 on AIME, 96.2 on MATH 500, 94-th percentile on Codeforces, 74.9 on MathVista -- matching OpenAI's o1. Moreover, we present effective long2short methods that use long-CoT techniques to improve short-CoT models, yielding state-of-the-art short-CoT reasoning results -- e.g., 60.8 on AIME, 94.6 on MATH500, 47.3 on LiveCodeBench -- outperforming existing short-CoT models such as GPT-4o and Claude Sonnet 3.5 by a large margin (up to +550%).

View on arXiv
@article{team2025_2501.12599,
  title={ Kimi k1.5: Scaling Reinforcement Learning with LLMs },
  author={ Kimi Team and Angang Du and Bofei Gao and Bowei Xing and Changjiu Jiang and Cheng Chen and Cheng Li and Chenjun Xiao and Chenzhuang Du and Chonghua Liao and Chuning Tang and Congcong Wang and Dehao Zhang and Enming Yuan and Enzhe Lu and Fengxiang Tang and Flood Sung and Guangda Wei and Guokun Lai and Haiqing Guo and Han Zhu and Hao Ding and Hao Hu and Hao Yang and Hao Zhang and Haotian Yao and Haotian Zhao and Haoyu Lu and Haoze Li and Haozhen Yu and Hongcheng Gao and Huabin Zheng and Huan Yuan and Jia Chen and Jianhang Guo and Jianlin Su and Jianzhou Wang and Jie Zhao and Jin Zhang and Jingyuan Liu and Junjie Yan and Junyan Wu and Lidong Shi and Ling Ye and Longhui Yu and Mengnan Dong and Neo Zhang and Ningchen Ma and Qiwei Pan and Qucheng Gong and Shaowei Liu and Shengling Ma and Shupeng Wei and Sihan Cao and Siying Huang and Tao Jiang and Weihao Gao and Weimin Xiong and Weiran He and Weixiao Huang and Wenhao Wu and Wenyang He and Xianghui Wei and Xianqing Jia and Xingzhe Wu and Xinran Xu and Xinxing Zu and Xinyu Zhou and Xuehai Pan and Y. Charles and Yang Li and Yangyang Hu and Yangyang Liu and Yanru Chen and Yejie Wang and Yibo Liu and Yidao Qin and Yifeng Liu and Ying Yang and Yiping Bao and Yulun Du and Yuxin Wu and Yuzhi Wang and Zaida Zhou and Zhaoji Wang and Zhaowei Li and Zhen Zhu and Zheng Zhang and Zhexu Wang and Zhilin Yang and Zhiqi Huang and Zihao Huang and Ziyao Xu and Zonghan Yang },
  journal={arXiv preprint arXiv:2501.12599},
  year={ 2025 }
}
Comments on this paper