RoboBrain 2.0 Technical Report

We introduce RoboBrain 2.0, our latest generation of embodied vision-language foundation models, designed to unify perception, reasoning, and planning for complex embodied tasks in physical environments. It comes in two variants: a lightweight 7B model and a full-scale 32B model, featuring a heterogeneous architecture with a vision encoder and a language model. Despite its compact size, RoboBrain 2.0 achieves strong performance across a wide spectrum of embodied reasoning tasks. On both spatial and temporal benchmarks, the 32B variant achieves leading results, surpassing prior open-source and proprietary models. In particular, it supports key real-world embodied AI capabilities, including spatial understanding (e.g., affordance prediction, spatial referring, trajectory forecasting) and temporal decision-making (e.g., closed-loop interaction, multi-agent long-horizon planning, and scene graph updating). This report details the model architecture, data construction, multi-stage training strategies, infrastructure and practical applications. We hope RoboBrain 2.0 advances embodied AI research and serves as a practical step toward building generalist embodied agents. The code, checkpoint and benchmark are available atthis https URL.
View on arXiv@article{team2025_2507.02029, title={ RoboBrain 2.0 Technical Report }, author={ BAAI RoboBrain Team and Mingyu Cao and Huajie Tan and Yuheng Ji and Minglan Lin and Zhiyu Li and Zhou Cao and Pengwei Wang and Enshen Zhou and Yi Han and Yingbo Tang and Xiangqi Xu and Wei Guo and Yaoxu Lyu and Yijie Xu and Jiayu Shi and Cheng Chi and Mengdi Zhao and Xiaoshuai Hao and Shanyu Rong and Zhengliang Cai and Bolun Zhang and Shuyi Zhang and Huaihai Lyu and Mengfei Du and Lingfeng Zhang and Xi Feng and Xiaodan Liu and Yance Jiao and Chenrui He and Mengsi Lyu and Zhuo Chen and Yulong Ao and Xue Sun and Zheqi He and Jingshu Zheng and Xi Yang and Donghai Shi and Kunchang Xie and Bochao Zhang and Shaokai Nie and Chunlei Men and Yonghua Lin and Zhongyuan Wang and Tiejun Huang and Shanghang Zhang }, journal={arXiv preprint arXiv:2507.02029}, year={ 2025 } }