ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2308.10901
  4. Cited By
Structured World Models from Human Videos

Structured World Models from Human Videos

21 August 2023
Russell Mendonca
Shikhar Bahl
Deepak Pathak
    LM&Ro
ArXivPDFHTML

Papers citing "Structured World Models from Human Videos"

50 / 77 papers shown
Title
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
Qingwen Bu
Y. Yang
Jisong Cai
Shenyuan Gao
Guanghui Ren
Maoqing Yao
Ping Luo
Hongyang Li
39
0
0
09 May 2025
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
PIN-WM: Learning Physics-INformed World Models for Non-Prehensile Manipulation
Wenxuan Li
Hang Zhao
Zhiyuan Yu
Yu Du
Qin Zou
Ruizhen Hu
K. Xu
SSL
71
1
0
23 Apr 2025
Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction
Novel Diffusion Models for Multimodal 3D Hand Trajectory Prediction
Junyi Ma
Wentao Bao
Jingyi Xu
Guanzhong Sun
Xieyuanli Chen
Hesheng Wang
30
0
0
10 Apr 2025
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos
Junyao Shi
Zhuolun Zhao
Tianyou Wang
Ian Pedroza
Amy Luo
Jie Wang
Jason Ma
Dinesh Jayaraman
LM&Ro
43
0
0
31 Mar 2025
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
PartRM: Modeling Part-Level Dynamics with Large Cross-State Reconstruction Model
Mingju Gao
Yike Pan
Huan-ang Gao
Zongzheng Zhang
Wenyi Li
Hao Dong
Hao Tang
Li Yi
Hao Zhao
VGen
37
0
0
25 Mar 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
AdaWorld: Learning Adaptable World Models with Latent Actions
Shenyuan Gao
Siyuan Zhou
Yilun Du
Jun Zhang
Chuang Gan
VGen
54
3
0
24 Mar 2025
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Jiaming Liu
Hao Chen
Pengju An
Zhuoyang Liu
Renrui Zhang
...
Chengkai Hou
Mengdi Zhao
KC alex Zhou
Pheng-Ann Heng
S. Zhang
60
5
0
13 Mar 2025
LuciBot: Automated Robot Policy Learning from Generated Videos
Xiaowen Qiu
Yian Wang
Jiting Cai
Zhehuan Chen
Chunru Lin
Tsun-Hsuan Wang
Chuang Gan
LM&Ro
VGen
67
0
0
12 Mar 2025
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space
Other Vehicle Trajectories Are Also Needed: A Driving World Model Unifies Ego-Other Vehicle Trajectories in Video Latent Space
Jian Zhu
Zhengyu Jia
Tian Gao
Jiaxin Deng
Shidi Li
Fu Liu
Peng Jia
Xianpeng Lang
Xiaolong Sun
VGen
80
0
0
12 Mar 2025
Toward Stable World Models: Measuring and Addressing World Instability in Generative Environments
Soonwoo Kwon
Jin-Young Kim
Hyojun Go
Kyungjune Baek
50
0
0
11 Mar 2025
Cross-Embodiment Robotic Manipulation Synthesis via Guided Demonstrations through CycleVAE and Human Behavior Transformer
Apan Dastider
Hao Fang
Mingjie Lin
36
0
0
11 Mar 2025
Four Principles for Physically Interpretable World Models
Jordan Peper
Zhenjiang Mao
Yuang Geng
Siyuan Pan
Ivan Ruchkin
105
1
0
04 Mar 2025
Exo-ViHa: A Cross-Platform Exoskeleton System with Visual and Haptic Feedback for Efficient Dexterous Skill Learning
Xintao Chao
Shilong Mu
Yushan Liu
Shoujie Li
Chuqiao Lyu
Xiao-Ping Zhang
Wenbo Ding
77
1
0
03 Mar 2025
Magma: A Foundation Model for Multimodal AI Agents
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
104
8
0
18 Feb 2025
Learning from Massive Human Videos for Universal Humanoid Pose Control
Learning from Massive Human Videos for Universal Humanoid Pose Control
Jiageng Mao
Siheng Zhao
Siqi Song
Tianheng Shi
Junjie Ye
Mingtong Zhang
Haoran Geng
Jitendra Malik
Vitor Campagnolo Guizilini
Yue Wang
85
5
0
18 Dec 2024
Reinforcement Learning from Wild Animal Videos
Reinforcement Learning from Wild Animal Videos
Elliot Chane-Sane
Constant Roux
O. Stasse
Nicolas Mansard
89
0
0
05 Dec 2024
CogACT: A Foundational Vision-Language-Action Model for Synergizing
  Cognition and Action in Robotic Manipulation
CogACT: A Foundational Vision-Language-Action Model for Synergizing Cognition and Action in Robotic Manipulation
Qixiu Li
Yaobo Liang
Zeyu Wang
Lin Luo
Xi Chen
...
Jianmin Bao
Dong Chen
Yuanchun Shi
Jiaolong Yang
B. Guo
LM&Ro
74
20
0
29 Nov 2024
Grounding Video Models to Actions through Goal Conditioned Exploration
Grounding Video Models to Actions through Goal Conditioned Exploration
Yunhao Luo
Yilun Du
LM&Ro
VGen
77
1
0
11 Nov 2024
Autoregressive Models in Vision: A Survey
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
M. Zhang
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
46
9
0
08 Nov 2024
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
DINO-WM: World Models on Pre-trained Visual Features enable Zero-shot Planning
G. Zhou
Hengkai Pan
Yann LeCun
Lerrel Pinto
VGen
LM&Ro
OffRL
35
12
0
07 Nov 2024
STEER: Flexible Robotic Manipulation via Dense Language Grounding
STEER: Flexible Robotic Manipulation via Dense Language Grounding
Laura Smith
A. Irpan
Montserrat Gonzalez Arenas
Sean Kirmani
Dmitry Kalashnikov
Dhruv Shah
Ted Xiao
LLMSV
32
1
0
05 Nov 2024
Pre-trained Visual Dynamics Representations for Efficient Policy
  Learning
Pre-trained Visual Dynamics Representations for Efficient Policy Learning
Hao Luo
Bohan Zhou
Zongqing Lu
26
0
0
05 Nov 2024
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic
  Manipulation
PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic Manipulation
K. Zhang
Pengzhen Ren
Bingqian Lin
Junfan Lin
Shikui Ma
Hang Xu
Xiaodan Liang
18
0
0
14 Oct 2024
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos Referring to Procedural Texts
EgoOops: A Dataset for Mistake Action Detection from Egocentric Videos Referring to Procedural Texts
Yuto Haneji
Taichi Nishimura
Hirotaka Kameko
Keisuke Shirai
Tomoya Yoshida
Keiya Kajimura
Koki Yamamoto
Taiyu Cui
Tomohiro Nishimoto
Shinsuke Mori
EgoV
44
0
0
07 Oct 2024
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language
  Models
IoT-LLM: Enhancing Real-World IoT Task Reasoning with Large Language Models
Tuo An
Yunjiao Zhou
Han Zou
Jianfei Yang
LRM
26
4
0
03 Oct 2024
AVID: Adapting Video Diffusion Models to World Models
AVID: Adapting Video Diffusion Models to World Models
Marc Rigter
Tarun Gupta
Agrin Hilmkil
Chao Ma
VGen
17
2
0
01 Oct 2024
World Model-based Perception for Visual Legged Locomotion
World Model-based Perception for Visual Legged Locomotion
Hang Lai
Jiahang Cao
Jiafeng Xu
Hongtao Wu
Yunfeng Lin
Tao Kong
Yong Yu
Weinan Zhang
VGen
19
2
0
25 Sep 2024
Embodiment-Agnostic Action Planning via Object-Part Scene Flow
Embodiment-Agnostic Action Planning via Object-Part Scene Flow
Weiliang Tang
Jia-Hui Pan
Wei Zhan
Jianshu Zhou
Huaxiu Yao
Yun-Hui Liu
M. Tomizuka
Mingyu Ding
Chi-Wing Fu
41
0
0
16 Sep 2024
Goal-Reaching Policy Learning from Non-Expert Observations via Effective
  Subgoal Guidance
Goal-Reaching Policy Learning from Non-Expert Observations via Effective Subgoal Guidance
Renming Huang
Shaochong Liu
Yunqiang Pei
Peng Wang
Guoqing Wang
Yang Yang
Hengtao Shen
OffRL
19
0
0
06 Sep 2024
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal
  Conditioned Policy
GR-MG: Leveraging Partially Annotated Data via Multi-Modal Goal Conditioned Policy
Peiyan Li
Hongtao Wu
Yan Huang
Chilam Cheang
Liang Wang
Tao Kong
VGen
46
11
0
26 Aug 2024
Scaling Cross-Embodied Learning: One Policy for Manipulation,
  Navigation, Locomotion and Aviation
Scaling Cross-Embodied Learning: One Policy for Manipulation, Navigation, Locomotion and Aviation
Ria Doshi
Homer Walke
Oier Mees
Sudeep Dasari
Sergey Levine
37
45
0
21 Aug 2024
Flow as the Cross-Domain Manipulation Interface
Flow as the Cross-Domain Manipulation Interface
Mengda Xu
Zhenjia Xu
Yinghao Xu
Cheng Chi
Gordon Wetzstein
Manuela Veloso
Shuran Song
AI4CE
26
32
0
21 Jul 2024
TieBot: Learning to Knot a Tie from Visual Demonstration through a
  Real-to-Sim-to-Real Approach
TieBot: Learning to Knot a Tie from Visual Demonstration through a Real-to-Sim-to-Real Approach
Weikun Peng
Jun Lv
Yuwei Zeng
Haonan Chen
Siheng Zhao
Jichen Sun
Cewu Lu
Lin Shao
26
1
0
03 Jul 2024
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
Xuxin Cheng
Jialong Li
Shiqi Yang
Ge Yang
Xiaolong Wang
59
93
0
01 Jul 2024
OpenVLA: An Open-Source Vision-Language-Action Model
OpenVLA: An Open-Source Vision-Language-Action Model
Moo Jin Kim
Karl Pertsch
Siddharth Karamcheti
Ted Xiao
Ashwin Balakrishna
...
Russ Tedrake
Dorsa Sadigh
Sergey Levine
Percy Liang
Chelsea Finn
LM&Ro
VLM
37
348
0
13 Jun 2024
Scaling Manipulation Learning with Visual Kinematic Chain Prediction
Scaling Manipulation Learning with Visual Kinematic Chain Prediction
Xinyu Zhang
Yuhan Liu
Haonan Chang
Abdeslam Boularias
44
1
0
12 Jun 2024
Investigating Pre-Training Objectives for Generalization in Vision-Based
  Reinforcement Learning
Investigating Pre-Training Objectives for Generalization in Vision-Based Reinforcement Learning
Donghu Kim
Hojoon Lee
Kyungmin Lee
Dongyoon Hwang
Jaegul Choo
OffRL
29
1
0
10 Jun 2024
Learning Manipulation by Predicting Interaction
Learning Manipulation by Predicting Interaction
Jia Zeng
Qingwen Bu
Bangjun Wang
Wenke Xia
Li Chen
...
Heming Cui
Bin Zhao
Xuelong Li
Yu Qiao
Hongyang Li
48
19
0
01 Jun 2024
World Models for General Surgical Grasping
World Models for General Surgical Grasping
Hongbin Lin
Bin Li
Chun Wai Wong
Juan Rojas
X. Chu
K. W. S. Au
19
3
0
28 May 2024
Vista: A Generalizable Driving World Model with High Fidelity and
  Versatile Controllability
Vista: A Generalizable Driving World Model with High Fidelity and Versatile Controllability
Shenyuan Gao
Jiazhi Yang
Li Chen
Kashyap Chitta
Yihang Qiu
Andreas Geiger
Jun Zhang
Hongyang Li
60
75
0
27 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Jianye Hao
Mingsheng Long
VGen
32
22
0
24 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
64
38
0
23 May 2024
One-Shot Imitation Learning with Invariance Matching for Robotic
  Manipulation
One-Shot Imitation Learning with Invariance Matching for Robotic Manipulation
Xinyu Zhang
Abdeslam Boularias
28
9
0
21 May 2024
Octo: An Open-Source Generalist Robot Policy
Octo: An Open-Source Generalist Robot Policy
Octo Model Team
Dibya Ghosh
Homer Walke
Karl Pertsch
Kevin Black
...
Quan Vuong
Ted Xiao
Dorsa Sadigh
Chelsea Finn
Sergey Levine
55
333
0
20 May 2024
Bidirectional Progressive Transformer for Interaction Intention
  Anticipation
Bidirectional Progressive Transformer for Interaction Intention Anticipation
Zichen Zhang
Hongcheng Luo
Wei Zhai
Yang Cao
Yu Kang
22
5
0
09 May 2024
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on
  Egocentric Videos
Diff-IP2D: Diffusion-Based Hand-Object Interaction Prediction on Egocentric Videos
Junyi Ma
Jingyi Xu
Xieyuanli Chen
Hesheng Wang
VGen
27
7
0
07 May 2024
ScrewMimic: Bimanual Imitation from Human Videos with Screw Space
  Projection
ScrewMimic: Bimanual Imitation from Human Videos with Screw Space Projection
Arpit Bahety
Priyanka Mandikal
Ben Abbatematteo
Roberto Martín-Martín
25
13
0
06 May 2024
Is Sora a World Simulator? A Comprehensive Survey on General World
  Models and Beyond
Is Sora a World Simulator? A Comprehensive Survey on General World Models and Beyond
Zheng Zhu
Xiaofeng Wang
Wangbo Zhao
Chen Min
Nianchen Deng
...
Dawei Zhao
Liang Xiao
Jian-jun Zhao
Jiwen Lu
Guan Huang
VGen
LM&Ro
76
35
0
06 May 2024
What Foundation Models can Bring for Robot Learning in Manipulation : A
  Survey
What Foundation Models can Bring for Robot Learning in Manipulation : A Survey
Dingzhe Li
Yixiang Jin
A. Yong
Hongze Yu
Jun Shi
Xiaoshuai Hao
Peng Hao
Huaping Liu
Fuchun Sun
Bin Fang
AI4CE
LM&Ro
64
12
0
28 Apr 2024
Vid2Robot: End-to-end Video-conditioned Policy Learning with
  Cross-Attention Transformers
Vid2Robot: End-to-end Video-conditioned Policy Learning with Cross-Attention Transformers
Vidhi Jain
Maria Attarian
Nikhil J. Joshi
Ayzaan Wahid
Danny Driess
...
Stefan Welker
Christine Chan
Igor Gilitschenski
Yonatan Bisk
Debidatta Dwibedi
68
27
0
19 Mar 2024
12
Next