65
v1v2 (latest)

AoE: Always-on Egocentric Human Video Collection for Embodied AI

Bowen Yang
Zishuo Li
Yang Sun
Changtao Miao
Yifan Yang
Man Luo
Xiaotong Yan
Feng Jiang
Jinchuan Shi
Yankai Fu
Ning Chen
Junkai Zhao
Pengwei Wang
Guocai Yao
Shanghang Zhang
Hao Chen
Zhe Li
Kai Zhu
Main:11 Pages
9 Figures
Bibliography:4 Pages
2 Tables
Appendix:9 Pages
Abstract

Embodied foundation models require large-scale, high-quality real-world interaction data for pre-training and scaling. However, existing data collection methods suffer from high infrastructure costs, complex hardware dependencies, and limited interaction scope, making scalable expansion challenging. In fact, humans themselves are ideal physically embodied agents. Therefore, obtaining egocentric real-world interaction data from globally distributed "human agents" offers advantages of low cost and sustainability. To this end, we propose the Always-on Egocentric (AoE) data collection system, which aims to simplify hardware dependencies by leveraging humans themselves and their smartphones, enabling low-cost, highly efficient, and scene-agnostic real-world interaction data collection to address the challenge of data scarcity. Specifically, we first employ an ergonomic neck-mounted smartphone holder to enable low-barrier, large-scale egocentric data collection through a cloud-edge collaborative architecture. Second, we develop a cross-platform mobile APP that leverages on-device compute for real-time processing, while the cloud hosts automated labeling and filtering pipelines that transform raw videos into high-quality training data. Finally, the AoE system supports distributed Ego video data collection by anyone, anytime, and anywhere. We evaluate AoE on data preprocessing quality and downstream tasks, demonstrating that high-quality egocentric data significantly boosts real-world generalization.

View on arXiv
Comments on this paper