ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.14734
51
12

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

18 March 2025
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
Runyu Ding
Linxi Fan
Yu Fang
Dieter Fox
F. Hu
S. Huang
Joel Jang
Z. L. Jiang
Jan Kautz
K. K.
Lawrence Lao
Zhiqi Li
Zongyu Lin
K. Lin
Guilin Liu
Edith Llontop
Loic Magne
Ajay Mandlekar
Avnish Narayan
Soroush Nasiriany
Scott Reed
You Liang Tan
Guanzhi Wang
Z. Wang
Jing Wang
Qi Wang
Jiannan Xiang
Yuqi Xie
Yinzhen Xu
Zhenjia Xu
Seonghyeon Ye
Zhiding Yu
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
    VLM
ArXivPDFHTML
Abstract

General-purpose robots need a versatile body and an intelligent mind. Recent advancements in humanoid robots have shown great promise as a hardware platform for building generalist autonomy in the human world. A robot foundation model, trained on massive and diverse data sources, is essential for enabling the robots to reason about novel situations, robustly handle real-world variability, and rapidly learn new tasks. To this end, we introduce GR00T N1, an open foundation model for humanoid robots. GR00T N1 is a Vision-Language-Action (VLA) model with a dual-system architecture. The vision-language module (System 2) interprets the environment through vision and language instructions. The subsequent diffusion transformer module (System 1) generates fluid motor actions in real time. Both modules are tightly coupled and jointly trained end-to-end. We train GR00T N1 with a heterogeneous mixture of real-robot trajectories, human videos, and synthetically generated datasets. We show that our generalist robot model GR00T N1 outperforms the state-of-the-art imitation learning baselines on standard simulation benchmarks across multiple robot embodiments. Furthermore, we deploy our model on the Fourier GR-1 humanoid robot for language-conditioned bimanual manipulation tasks, achieving strong performance with high data efficiency.

View on arXiv
@article{nvidia2025_2503.14734,
  title={ GR00T N1: An Open Foundation Model for Generalist Humanoid Robots },
  author={ NVIDIA and Johan Bjorck and Fernando Castañeda and Nikita Cherniadev and Xingye Da and Runyu Ding and Linxi "Jim" Fan and Yu Fang and Dieter Fox and Fengyuan Hu and Spencer Huang and Joel Jang and Zhenyu Jiang and Jan Kautz and Kaushil Kundalia and Lawrence Lao and Zhiqi Li and Zongyu Lin and Kevin Lin and Guilin Liu and Edith Llontop and Loic Magne and Ajay Mandlekar and Avnish Narayan and Soroush Nasiriany and Scott Reed and You Liang Tan and Guanzhi Wang and Zu Wang and Jing Wang and Qi Wang and Jiannan Xiang and Yuqi Xie and Yinzhen Xu and Zhenjia Xu and Seonghyeon Ye and Zhiding Yu and Ao Zhang and Hao Zhang and Yizhou Zhao and Ruijie Zheng and Yuke Zhu },
  journal={arXiv preprint arXiv:2503.14734},
  year={ 2025 }
}
Comments on this paper