RynnBrain: Open Embodied Foundation Models

13 February 2026

Ronghao Dang

Jiayan Guo

Bohan Hou

Sicong Leng

Kehan Li

Xin Li

Jiangpin Liu

Yunxuan Mao

Zhikai Wang

Yuqian Yuan

Minghao Zhu

Xiao Lin

Yang Bai

Qian Jiang

Yaxi Zhao

Minghua Zeng

Junlong Gao

Yuming Jiang

Jun Cen

Siteng Huang

Liuyi Wang

Wenqiao Zhang

Chengju Liu

Jianfei Yang

Shijian Lu

Deli Zhao

LM&Ro

AI4CE

LRM

ArXiv (abs)PDF HTML HuggingFace (32 upvotes)Github (383★)

Main:23 Pages

18 Figures

Bibliography:8 Pages

9 Tables

Appendix:17 Pages

Abstract

Despite rapid progress in multimodal foundation models, embodied intelligence community still lacks a unified, physically grounded foundation model that integrates perception, reasoning, and planning within real-world spatial-temporal dynamics. We introduce RynnBrain, an open-source spatiotemporal foundation model for embodied intelligence. RynnBrain strengthens four core capabilities in a unified framework: comprehensive egocentric understanding, diverse spatiotemporal localization, physically grounded reasoning, and physics-aware planning. The RynnBrain family comprises three foundation model scales (2B, 8B, and 30B-A3B MoE) and four post-trained variants tailored for downstream embodied tasks (i.e., RynnBrain-Nav, RynnBrain-Plan, and RynnBrain-VLA) or complex spatial reasoning tasks (i.e., RynnBrain-CoP). In terms of extensive evaluations on 20 embodied benchmarks and 8 general vision understanding benchmarks, our RynnBrain foundation models largely outperform existing embodied foundation models by a significant margin. The post-trained model suite further substantiates two key potentials of the RynnBrain foundation model: (i) enabling physically grounded reasoning and planning, and (ii) serving as a strong pretrained backbone that can be efficiently adapted to diverse embodied tasks.

View on arXiv

Comments on this paper