ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.13877
95
15

RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation

17 February 2025
Kun Wu
Chengkai Hou
Jiaming Liu
Zhengping Che
Xiaozhu Ju
Zhuqin Yang
Meng-Jie Li
Yinuo Zhao
Z. Xu
Guang Yang
Zhen Zhao
G. Li
Zhao Jin
Lecheng Wang
Jilei Mao
X. Wang
Shichao Fan
Ning Liu
Pei Ren
Qiang Zhang
Yaoxu Lyu
Mengzhen Liu
Jingyang He
Yulin Luo
Z. Gao
C. Li
Chenyang Gu
Y. Fu
Di Wu
X. Wang
Sixiang Chen
S. Chen
Zhenyu Wang
Pengju An
Siyuan Qian
S. Zhang
Jian Tang
    LM&Ro
ArXivPDFHTML
Abstract

In this paper, we introduce RoboMIND (Multi-embodiment Intelligence Normative Data for Robot Manipulation), a dataset containing 107k demonstration trajectories across 479 diverse tasks involving 96 object classes. RoboMIND is collected through human teleoperation and encompasses comprehensive robotic-related information, including multi-view observations, proprioceptive robot state information, and linguistic task descriptions. To ensure data consistency and reliability for imitation learning, RoboMIND is built on a unified data collection platform and a standardized protocol, covering four distinct robotic embodiments: the Franka Emika Panda, the UR5e, the AgileX dual-arm robot, and a humanoid robot with dual dexterous hands. Our dataset also includes 5k real-world failure demonstrations, each accompanied by detailed causes, enabling failure reflection and correction during policy learning. Additionally, we created a digital twin environment in the Isaac Sim simulator, replicating the real-world tasks and assets, which facilitates the low-cost collection of additional training data and enables efficient evaluation. To demonstrate the quality and diversity of our dataset, we conducted extensive experiments using various imitation learning methods for single-task settings and state-of-the-art Vision-Language-Action (VLA) models for multi-task scenarios. By leveraging RoboMIND, the VLA models achieved high manipulation success rates and demonstrated strong generalization capabilities. To the best of our knowledge, RoboMIND is the largest multi-embodiment teleoperation dataset collected on a unified platform, providing large-scale and high-quality robotic training data. Our project is atthis https URL.

View on arXiv
@article{wu2025_2412.13877,
  title={ RoboMIND: Benchmark on Multi-embodiment Intelligence Normative Data for Robot Manipulation },
  author={ Kun Wu and Chengkai Hou and Jiaming Liu and Zhengping Che and Xiaozhu Ju and Zhuqin Yang and Meng Li and Yinuo Zhao and Zhiyuan Xu and Guang Yang and Shichao Fan and Xinhua Wang and Fei Liao and Zhen Zhao and Guangyu Li and Zhao Jin and Lecheng Wang and Jilei Mao and Ning Liu and Pei Ren and Qiang Zhang and Yaoxu Lyu and Mengzhen Liu and Jingyang He and Yulin Luo and Zeyu Gao and Chenxuan Li and Chenyang Gu and Yankai Fu and Di Wu and Xingyu Wang and Sixiang Chen and Zhenyu Wang and Pengju An and Siyuan Qian and Shanghang Zhang and Jian Tang },
  journal={arXiv preprint arXiv:2412.13877},
  year={ 2025 }
}
Comments on this paper