
![]() Theia: Distilling Diverse Vision Foundation Models for Robot LearningConference on Robot Learning (CoRL), 2024 |
![]() Mixture of Nested Experts: Adaptive Processing of Visual TokensNeural Information Processing Systems (NeurIPS), 2024 |
![]() Towards Event-oriented Long Video Understanding Yifan Du Kun Zhou Yuqi Huo Yifan Li Wayne Xin Zhao Haoyu Lu Zijia Zhao Bingning Wang Weipeng Chen Ji-Rong Wen |
![]() HumanPlus: Humanoid Shadowing and Imitation from HumansConference on Robot Learning (CoRL), 2024 |
![]() Pandora: Towards General World Model with Natural Language Actions and
Video States Jiannan Xiang Guangyi Liu Yi Gu Qiyue Gao Yuting Ning ...Shibo Hao Yemin Shi Zhengzhong Liu Eric P. Xing Zhiting Hu |
![]() Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data PerspectivesAnnual Meeting of the Association for Computational Linguistics (ACL), 2024 |