66
v1v2 (latest)

LongCat-Flash-Thinking-2601 Technical Report

Meituan LongCat Team
Anchun Gui
Bei Li
Bingyang Tao
Bole Zhou
Borun Chen
Chao Zhang
Chao Zhang
Chen Gao
Chen Zhang
Chengcheng Han
Chenhui Yang
Chuyu Zhang
Cong Chen
Cunguang Wang
Daoru Pan
Defei Bu
Dengchang Zhao
Di Xiu
Dishan Liu
Dongyu Ru
Dunwei Tu
Fan Wu
Fengcheng Yuan
Fengcun Li
Gang Xu
Guanyu Wu
Guoyuan Lin
Haibin Wang
Hansi Yang
Hao Yang
Haonan Yan
Haoxiang Ma
Haoxing Wen
Hongyan Hao
Hongyin Tang
Hongyu Zang
Hongzhi Ni
Hui Su
Jiacheng Zhang
Jiahong Zhou
Jiahuan Li
Jiaming Wang
Jian Yang
Jianfei Zhang
Jianhao Xu
Jianing Wang
Jiapeng Zhu
Jiaqi Sun
Jiarong Shi
Jiarui Zhao
Jingang Wang
Jinluan Yang
Jinrui Ding
Jinwei Xiao
Jiyuan He
Juncan Xu
Kefeng Zhang
Keheng Wang
Li Wei
Lianhui Ma
Lin Qiu
Lingbing Kong
Lingchuan Liu
Linsen Guo
Mengshen Zhu
Mengxia Shen
Mingyang Zhu
Peiguang Li
Peng Pei
Peng Zhao
Pengcheng Jia
Pengtao Zhang
Ping Liu
Qi Gu
Qiong Huang
Qiyuan Duan
Quanchi Weng
Rongxiang Weng
Rongzhi Zhang
Rumei Li
Shanglin Lei
Shengnan An
Shijun Dai
Shizhe Wu
Shuaikang Liu
Shuang Zhou
Shuo Wang
Songyuan Zhao
Tao Liang
Tianhao Hu
Tianze Chen
Wei Liu
Wei Shi
Wei Wang
Weifeng Tang
Wenjie Shi
Wenlong Zhu
Wentao Chen
Wentao Shi
Main:21 Pages
15 Figures
Bibliography:5 Pages
3 Tables
Appendix:1 Pages
Abstract

We introduce LongCat-Flash-Thinking-2601, a 560-billion-parameter open-source Mixture-of-Experts (MoE) reasoning model with superior agentic reasoning capability. LongCat-Flash-Thinking-2601 achieves state-of-the-art performance among open-source models on a wide range of agentic benchmarks, including agentic search, agentic tool use, and tool-integrated reasoning. Beyond benchmark performance, the model demonstrates strong generalization to complex tool interactions and robust behavior under noisy real-world environments. Its advanced capability stems from a unified training framework that combines domain-parallel expert training with subsequent fusion, together with an end-to-end co-design of data construction, environments, algorithms, and infrastructure spanning from pre-training to post-training. In particular, the model's strong generalization capability in complex tool-use are driven by our in-depth exploration of environment scaling and principled task construction. To optimize long-tailed, skewed generation and multi-turn agentic interactions, and to enable stable training across over 10,000 environments spanning more than 20 domains, we systematically extend our asynchronous reinforcement learning framework, DORA, for stable and efficient large-scale multi-environment training. Furthermore, recognizing that real-world tasks are inherently noisy, we conduct a systematic analysis and decomposition of real-world noise patterns, and design targeted training procedures to explicitly incorporate such imperfections into the training process, resulting in improved robustness for real-world applications. To further enhance performance on complex reasoning tasks, we introduce a Heavy Thinking mode that enables effective test-time scaling by jointly expanding reasoning depth and width through intensive parallel thinking.

View on arXiv
Comments on this paper