ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.10901
  4. Cited By
Structured World Models from Human Videos

Structured World Models from Human Videos

21 August 2023
Russell Mendonca
Shikhar Bahl
Deepak Pathak
    LM&Ro
ArXiv (abs)PDFHTML

Papers citing "Structured World Models from Human Videos"

49 / 99 papers shown
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic
  Manipulation
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic ManipulationNeural Information Processing Systems (NeurIPS), 2024
Jianchao Tan
Pengzhen Ren
Bingqian Lin
Junfan Lin
Shikui Ma
Hang Xu
Xiaodan Liang
315
7
0
14 Oct 2024
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos referring to Procedural Texts
Yuto Haneji
Taichi Nishimura
Hirotaka Kameko
Keisuke Shirai
Tomoya Yoshida
Keiya Kajimura
Koki Yamamoto
Taiyu Cui
Tomohiro Nishimoto
Shinsuke Mori
EgoV
278
0
0
07 Oct 2024
IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor data
IoT-LLM: a framework for enhancing Large Language Model reasoning from real-world sensor dataPatterns (Patterns), 2024
Tuo An
Yunjiao Zhou
Han Zou
Jianfei Yang
LRM
417
20
0
03 Oct 2024
AVID: Adapting Video Diffusion Models to World Models
AVID: Adapting Video Diffusion Models to World Models
Marc Rigter
Tarun Gupta
Agrin Hilmkil
Chao Ma
VGen
295
18
0
01 Oct 2024
World Model-based Perception for Visual Legged Locomotion
World Model-based Perception for Visual Legged LocomotionIEEE International Conference on Robotics and Automation (ICRA), 2024
Hang Lai
Jiahang Cao
Jiafeng Xu
Hongtao Wu
Yunfeng Lin
Tao Kong
Yong Yu
Weinan Zhang
VGen
187
15
0
25 Sep 2024
Embodiment-Agnostic Action Planning via Object-Part Scene Flow
Embodiment-Agnostic Action Planning via Object-Part Scene FlowIEEE International Conference on Robotics and Automation (ICRA), 2024
Weiliang Tang
Jia-Hui Pan
Wei Zhan
Jianshu Zhou
Huaxiu Yao
Yun-Hui Liu
Masayoshi Tomizuka
Mingyu Ding
Chi-Wing Fu
229
5
0
16 Sep 2024
Hand-Object Interaction Pretraining from Videos
Hand-Object Interaction Pretraining from VideosIEEE International Conference on Robotics and Automation (ICRA), 2024
Himanshu Gaurav Singh
Antonio Loquercio
Carmelo Sferrazza
Jane Wu
Haozhi Qi
Pieter Abbeel
Jitendra Malik
213
35
0
12 Sep 2024
Goal-Reaching Policy Learning from Non-Expert Observations via Effective
  Subgoal Guidance
Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal GuidanceConference on Robot Learning (CoRL), 2024
Renming Huang
Shaochong Liu
Yunqiang Pei
Peng Wang
Guoqing Wang
Yang Yang
Hengtao Shen
OffRL
264
0
0
06 Sep 2024
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal
  Conditioned Policy
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned PolicyIEEE Robotics and Automation Letters (RA-L), 2024
Peiyan Li
Hongtao Wu
Yan Huang
Chilam Cheang
Liang Wang
Tao Kong
VGen
220
31
0
26 Aug 2024
Scaling Cross-Embodied Learning: One Policy for Manipulation,
  Navigation, Locomotion and Aviation
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and AviationConference on Robot Learning (CoRL), 2024
Ria Doshi
Homer Walke
Oier Mees
Sudeep Dasari
Sergey Levine
382
98
0
21 Aug 2024
Flow as the Cross-Domain Manipulation Interface
Flow as the Cross-Domain Manipulation Interface
Mengda Xu
Zhenjia Xu
Yinghao Xu
Cheng Chi
Gordon Wetzstein
Manuela Veloso
Shuran Song
AI4CE
310
103
0
21 Jul 2024
TieBot: Learning to Knot a Tie from Visual Demonstration through a
  Real-to-Sim-to-Real Approach
TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach
Weikun Peng
Jun Lv
Yuwei Zeng
Haonan Chen
Siheng Zhao
Jichen Sun
Cewu Lu
Lin Shao
280
6
0
03 Jul 2024
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
Xuxin Cheng
Jialong Li
Shiqi Yang
Ge Yang
Xiaolong Wang
460
208
0
01 Jul 2024
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim
Karl Pertsch
Siddharth Karamcheti
Ted Xiao
Ashwin Balakrishna
...
Russ Tedrake
Dorsa Sadigh
Sergey Levine
Percy Liang
Chelsea Finn
LM&RoVLM
597
1,350
0
13 Jun 2024
Scaling Manipulation Learning with Visual Kinematic Chain Prediction
Scaling Manipulation Learning with Visual Kinematic Chain Prediction
Xinyu Zhang
Yuhan Liu
Haonan Chang
Abdeslam Boularias
224
2
0
12 Jun 2024
Investigating Pre-Training Objectives for Generalization in Vision-Based
  Reinforcement Learning
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement LearningInternational Conference on Machine Learning (ICML), 2024
Donghu Kim
Hojoon Lee
Kyungmin Lee
Dongyoon Hwang
Jaegul Choo
OffRL
258
3
0
10 Jun 2024
Learning Manipulation by Predicting Interaction
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
391
38
0
01 Jun 2024
World Models for General Surgical Grasping
World Models for General Surgical Grasping
Hongbin Lin
Bin Li
Chun Wai Wong
Juan Rojas
Xiangyu Chu
K. W. S. Au
232
7
0
28 May 2024
Vista: A Generalizable Driving World Model with High Fidelity and
  Versatile Controllability
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Shenyuan Gao
Jiazhi Yang
Li Chen
Kashyap Chitta
Yihang Qiu
Andreas Geiger
Jun Zhang
Hongyang Li
453
209
0
27 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Haifeng Zhang
Mingsheng Long
VGen
290
84
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
885
166
0
23 May 2024
One-Shot Imitation Learning with Invariance Matching for Robotic
  Manipulation
One-Shot Imitation Learning with Invariance Matching for Robotic Manipulation
Xinyu Zhang
Abdeslam Boularias
387
19
0
21 May 2024
Octo: An Open-Source Generalist Robot Policy
Octo: An Open-Source Generalist Robot Policy
Octo Model Team
Dibya Ghosh
Homer Walke
Karl Pertsch
Kevin Black
...
Quan Vuong
Ted Xiao
Dorsa Sadigh
Chelsea Finn
Sergey Levine
554
876
0
20 May 2024
Bidirectional Progressive Transformer for Interaction Intention
  Anticipation
Bidirectional Progressive Transformer for Interaction Intention AnticipationEuropean Conference on Computer Vision (ECCV), 2024
Zichen Zhang
Hongcheng Luo
Wei Zhai
Yang Cao
Yu Kang
322
8
0
09 May 2024
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos
Junyi Ma
Jingyi Xu
Xieyuanli Chen
Hesheng Wang
VGen
501
19
0
07 May 2024
ScrewMimic: Bimanual Imitation from Human Videos with Screw Space
  Projection
ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection
Arpit Bahety
Priyanka Mandikal
Ben Abbatematteo
Roberto Martín-Martín
292
26
0
06 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGenLM&Ro
362
81
0
06 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Yong A
Hongze Yu
...
Huaping Liu
Gang Hua
F. Sun
Jianwei Zhang
Bin Fang
AI4CELM&Ro
900
26
0
28 Apr 2024
Vid2Robot: End-to-end Video-conditioned Policy Learning with
  Cross-Attention Transformers
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Vidhi Jain
Maria Attarian
Nikhil J. Joshi
Ayzaan Wahid
Danny Driess
...
Stefan Welker
Christine Chan
Igor Gilitschenski
Yonatan Bisk
Debidatta Dwibedi
326
48
0
19 Mar 2024
AD3: Implicit Action is the Key for World Models to Distinguish the
  Diverse Visual Distractors
AD3: Implicit Action is the Key for World Models to Distinguish the Diverse Visual DistractorsInternational Conference on Machine Learning (ICML), 2024
Yucen Wang
Shenghua Wan
Le Gan
Shuai Feng
De-Chuan Zhan
VGen
263
7
0
15 Mar 2024
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic
  Manipulation
ManiGaussian: Dynamic Gaussian Splatting for Multi-task Robotic ManipulationEuropean Conference on Computer Vision (ECCV), 2024
Guanxing Lu
Shiyi Zhang
Ziwei Wang
Changliu Liu
Jiwen Lu
Yansong Tang
343
106
0
13 Mar 2024
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Spatiotemporal Predictive Pre-training for Robotic Motor Control
Jiange Yang
Bei Liu
Jianlong Fu
Bocheng Pan
Gangshan Wu
Limin Wang
369
20
0
08 Mar 2024
Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach
  for Robust Manipulation
Reconciling Reality through Simulation: A Real-to-Sim-to-Real Approach for Robust Manipulation
M. Torné
Anthony Simeonov
Zechu Li
April Chan
Tao Chen
Abhishek Gupta
Pulkit Agrawal
274
119
0
06 Mar 2024
World Models for Autonomous Driving: An Initial Survey
World Models for Autonomous Driving: An Initial Survey
Yanchen Guan
Haicheng Liao
Zhenning Li
Jia Hu
Runze Yuan
Yunjian Li
Guohui Zhang
Chengzhong Xu
428
79
0
05 Mar 2024
DecisionNCE: Embodied Multimodal Representations via Implicit Preference
  Learning
DecisionNCE: Embodied Multimodal Representations via Implicit Preference Learning
Jianxiong Li
Jinliang Zheng
Yinan Zheng
Liyuan Mao
Xiaoming Hu
...
Jihao Liu
Yu Liu
Jingjing Liu
Ya Zhang
Xianyuan Zhan
LM&RoOffRL
280
14
0
28 Feb 2024
Learning by Watching: A Review of Video-based Learning Approaches for Robot Manipulation
Learning by Watching: A Review of Video-based Learning Approaches for Robot ManipulationIEEE Access (IEEE Access), 2024
Chrisantus Eze
Christopher Crick
SSL
466
16
0
11 Feb 2024
A Survey on Robotics with Foundation Models: toward Embodied AI
A Survey on Robotics with Foundation Models: toward Embodied AI
Zhiyuan Xu
Kun Wu
Junjie Wen
Jinming Li
Ning Liu
Zhengping Che
Jian Tang
AI4CELRMLM&Ro
280
63
0
04 Feb 2024
Adaptive Mobile Manipulation for Articulated Objects In the Open World
Adaptive Mobile Manipulation for Articulated Objects In the Open World
Haoyu Xiong
Russell Mendonca
Kenneth Shaw
Deepak Pathak
318
65
0
25 Jan 2024
General Flow as Foundation Affordance for Scalable Robot Learning
General Flow as Foundation Affordance for Scalable Robot LearningConference on Robot Learning (CoRL), 2024
Chengbo Yuan
Chuan Wen
Tong Zhang
Yang Gao
AI4CE
330
69
0
21 Jan 2024
Visual Robotic Manipulation with Depth-Aware Pretraining
Visual Robotic Manipulation with Depth-Aware PretrainingIEEE International Conference on Robotics and Biomimetics (ROBIO), 2024
Wanying Wang
Jinming Li
Yichen Zhu
Zhiyuan Xu
Zhengping Che
Chaomin Shen
Yaxin Peng
Dong Liu
Feifei Feng
Jian Tang
MDE
304
12
0
17 Jan 2024
Robo-ABC: Affordance Generalization Beyond Categories via Semantic
  Correspondence for Robot Manipulation
Robo-ABC: Affordance Generalization Beyond Categories via Semantic Correspondence for Robot ManipulationEuropean Conference on Computer Vision (ECCV), 2024
Yuanchen Ju
Kaizhe Hu
Guowei Zhang
Gu Zhang
Mingrun Jiang
Huazhe Xu
276
77
0
15 Jan 2024
Any-point Trajectory Modeling for Policy Learning
Any-point Trajectory Modeling for Policy Learning
Chuan Wen
Xingyu Lin
John So
Kai-xiang Chen
Qi Dou
Yang Gao
Pieter Abbeel
PINNVGen
544
170
0
28 Dec 2023
Unleashing Large-Scale Video Generative Pre-training for Visual Robot
  Manipulation
Unleashing Large-Scale Video Generative Pre-training for Visual Robot Manipulation
Hongtao Wu
Ya Jing
Chi-Hou Cheang
Guangzeng Chen
Jiafeng Xu
Xinghang Li
Minghuan Liu
Hang Li
Tao Kong
466
232
0
20 Dec 2023
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for
  Human-Level Planning
EgoPlan-Bench: Benchmarking Multimodal Large Language Models for Human-Level Planning
Yi Chen
Yuying Ge
Yixiao Ge
Mingyu Ding
Bohao Li
Rui Wang
Rui-Lan Xu
Ying Shan
Xihui Liu
LLMAGELMLRM
355
30
0
11 Dec 2023
Applications of Large Scale Foundation Models for Autonomous Driving
Applications of Large Scale Foundation Models for Autonomous Driving
Yu Huang
Yue Chen
Zhu Li
ELMAI4CELRMALMLM&Ro
641
21
0
20 Nov 2023
DreamSmooth: Improving Model-based Reinforcement Learning via Reward
  Smoothing
DreamSmooth: Improving Model-based Reinforcement Learning via Reward SmoothingInternational Conference on Learning Representations (ICLR), 2023
Vint Lee
Pieter Abbeel
Youngwoon Lee
221
7
0
02 Nov 2023
Model-Based Runtime Monitoring with Interactive Imitation Learning
Model-Based Runtime Monitoring with Interactive Imitation LearningIEEE International Conference on Robotics and Automation (ICRA), 2023
Huihan Liu
Shivin Dass
Roberto Martín-Martín
Yuke Zhu
228
32
0
26 Oct 2023
Pre-training Contextualized World Models with In-the-wild Videos for
  Reinforcement Learning
Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2023
Jialong Wu
Haoyu Ma
Chao Deng
Mingsheng Long
OffRL
289
45
0
29 May 2023
Pretrained Language Models as Visual Planners for Human Assistance
Pretrained Language Models as Visual Planners for Human AssistanceIEEE International Conference on Computer Vision (ICCV), 2023
Dhruvesh Patel
H. Eghbalzadeh
Nitin Kamra
Michael L. Iuzzolino
Unnat Jain
Ruta Desai
LM&Ro
326
35
0
17 Apr 2023
Previous
12