Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2406.09455
Cited By
Pandora: Towards General World Model with Natural Language Actions and Video States
12 June 2024
Jiannan Xiang
Guangyi Liu
Yi Gu
Qiyue Gao
Yuting Ning
Yuheng Zha
Zeyu Feng
Tianhua Tao
Shibo Hao
Yemin Shi
Zhengzhong Liu
Eric P. Xing
Zhiting Hu
VGen
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (15 upvotes)
Papers citing
"Pandora: Towards General World Model with Natural Language Actions and Video States"
47 / 47 papers shown
Title
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
Zeqing Wang
Keze Wang
Lei Zhang
VGen
108
0
0
01 Dec 2025
SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds
J. Ren
Yan Zhuang
Xiaokang Ye
Lingjun Mao
Xuhong He
...
Xianrui Zhong
Ziqiao Ma
Tianmin Shu
Zhiting Hu
Lianhui Qin
LLMAG
VGen
216
1
0
30 Nov 2025
In-Video Instructions: Visual Signals as Generative Control
Gongfan Fang
Xinyin Ma
Xinchao Wang
VGen
88
0
0
24 Nov 2025
Counterfactual World Models via Digital Twin-conditioned Video Diffusion
Yiqing Shen
Aiza Maksutova
Chenjia Li
Mathias Unberath
DiffM
VGen
165
0
0
21 Nov 2025
Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos
Taiyi Su
Jian Zhu
Yaxuan Li
Chong Ma
Zitai Huang
Yichen Zhu
Hanli Wang
VGen
250
0
0
17 Nov 2025
Simulating the Visual World with Artificial Intelligence: A Roadmap
Jingtong Yue
Z. Huang
Z. Chen
Xintao Wang
Pengfei Wan
Ziwei Liu
VGen
LM&Ro
387
0
0
11 Nov 2025
A Step Toward World Models: A Survey on Robotic Manipulation
Peng-Fei Zhang
Ying Cheng
Xiaofan Sun
S. Wang
Lei Zhu
Lei Zhu
Heng Tao Shen
LM&Ro
734
2
0
31 Oct 2025
A Comprehensive Survey on World Models for Embodied AI
Xinqing Li
Xin He
Le Zhang
Yun-Hai Liu
Xiaoli Li
Yun-Hai Liu
VGen
LM&Ro
SyDa
244
3
0
19 Oct 2025
Terra: Explorable Native 3D World Model with Point Latents
Yuanhui Huang
Weiliang Chen
Wenzhao Zheng
Xin Tao
Pengfei Wan
Jie Zhou
Jiwen Lu
VGen
122
0
0
16 Oct 2025
MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator
Xuehai He
Shijie Zhou
Thivyanth Venkateswaran
Kaizhi Zheng
Ziyu Wan
A. Kadambi
Xin Eric Wang
VGen
SyDa
AI4CE
160
0
0
05 Oct 2025
VIVA+: Human-Centered Situational Decision-Making
Zhe Hu
Yixiao Ren
Guanzhong Liu
Jing Li
Yu Yin
LRM
111
0
0
28 Sep 2025
Learning Primitive Embodied World Models: Towards Scalable Robotic Learning
Qiao Sun
Liujia Yang
Wei Tang
Wei Huang
Kaixin Xu
...
Tong He
Yilun Chen
Xili Dai
Nanyang Ye
Qinying Gu
VGen
LM&Ro
405
1
0
28 Aug 2025
Critiques of World Models
Eric P. Xing
Mingkai Deng
Jinyu Hou
Zhiting Hu
SyDa
214
6
0
07 Jul 2025
GenWorld: Towards Detecting AI-generated Real-world Simulation Videos
Weiliang Chen
Wenzhao Zheng
Yu Zheng
Lei Chen
Jie Zhou
Jiwen Lu
Yueqi Duan
VGen
308
3
0
12 Jun 2025
Long-Context State-Space Video World Models
Ryan Po
Yotam Nitzan
Richard Zhang
Berlin Chen
Tri Dao
Eli Shechtman
Gordon Wetzstein
Xun Huang
312
25
0
26 May 2025
DreamGen: Unlocking Generalization in Robot Learning through Video World Models
Joel Jang
Seonghyeon Ye
Zongyu Lin
Jiannan Xiang
Johan Bjorck
...
Dieter Fox
Jan Kautz
Scott Reed
Yuke Zhu
Linxi Fan
VGen
OffRL
AI4TS
382
0
0
19 May 2025
AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction
Junhao Cheng
Yuying Ge
Yixiao Ge
Jing Liao
Mingyu Ding
VGen
AI4CE
412
5
0
01 Apr 2025
WorldScore: A Unified Evaluation Benchmark for World Generation
Haoyi Duan
Hong-Xing Yu
Sirui Chen
L. Fei-Fei
Jiajun Wu
VGen
389
42
0
01 Apr 2025
VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior
Xindi Yang
Baolu Li
Yanzhe Zhang
Zhenfei Yin
Lei Bai
...
Zhiyong Wang
Jianfei Cai
Tien-Tsin Wong
Huchuan Lu
Xu Jia
DiffM
VGen
492
0
0
30 Mar 2025
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
Computer Vision and Pattern Recognition (CVPR), 2025
Qingqing Zhao
Yao Lu
Moo Jin Kim
Zipeng Fu
Zhuoyang Zhang
...
Ankur Handa
Xuan Li
Donglai Xiang
Gordon Wetzstein
Nayeon Lee
LM&Ro
LRM
335
193
0
27 Mar 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
Shenyuan Gao
Siyuan Zhou
Yilun Du
Jun Zhang
Chuang Gan
VGen
533
33
0
24 Mar 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
545
381
0
18 Mar 2025
WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation
Jing Wang
Ao Ma
Ke Cao
Jun Zheng
Zhanjie Zhang
...
Yuhang Ma
Bo Cheng
Dawei Leng
Yuhui Yin
Xiaodan Liang
VGen
324
29
0
11 Mar 2025
WorldModelBench: Judging Video Generation Models As World Models
Dacheng Li
Yunhao Fang
Yukang Chen
Shuo Yang
Shiyi Cao
...
Hongxu Yin
Alfons Kemper
Ion Stoica
Enze Xie
Yaojie Lu
VGen
229
26
0
28 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
390
1
0
12 Feb 2025
DMWM: Dual-Mind World Model with Long-Term Imagination
Lingyi Wang
Rashed Shelim
Walid Saad
Naren Ramakrishnan
LRM
1.0K
4
0
11 Feb 2025
Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile
Hangliang Ding
Dacheng Li
Runlong Su
Peiyuan Zhang
Zhijie Deng
Eric Liang
Hao Zhang
VGen
369
16
0
10 Feb 2025
Pre-Trained Video Generative Models as World Simulators
Haoran He
Yang Zhang
Guanbin Li
Zhihao Xu
Ling Pan
VGen
368
21
0
10 Feb 2025
DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers
Yuntao Chen
Yuqi Wang
Rundong Wang
1.0K
42
0
24 Dec 2024
Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video Generation
Computer Vision and Pattern Recognition (CVPR), 2024
Yiping Wang
Xuehai He
Kuan-Chieh Wang
Luyao Ma
Jianwei Yang
Shuohang Wang
Simon Shaolei Du
Yelong Shen
VGen
330
9
0
17 Dec 2024
GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition Control
Computer Vision and Pattern Recognition (CVPR), 2024
Mariam Hassan
Sebastian Stapf
Ahmad Rahimi
Pedro M B Rezende
Yasaman Haghighi
...
Mathieu Salzmann
Davide Scaramuzza
Marc Pollefeys
Paolo Favaro
Alexandre Alahi
VLM
VGen
322
39
0
15 Dec 2024
The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control
Ruili Feng
Han Zhang
Zhantao Yang
Jie Xiao
Zhilei Shu
Zhiheng Liu
Andy Zheng
Yukun Huang
Yu Liu
Han Zhang
VGen
274
45
0
04 Dec 2024
ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online Restoration
Computer Vision and Pattern Recognition (CVPR), 2024
Chaojun Ni
Guosheng Zhao
Xiaofeng Wang
Zheng Hua Zhu
Wenkang Qin
...
Kun Zhan
Fu Liu
Xianpeng Lang
Xingang Wang
Wenjun Mei
VGen
822
52
0
29 Nov 2024
Understanding World or Predicting Future? A Comprehensive Survey of World Models
ACM Computing Surveys (ACM CSUR), 2024
Jingtao Ding
Yunke Zhang
Yu Shang
Yuheng Zhang
Zefang Zong
...
Fengli Xu
Yong Li
Chen Gao
Fengli Xu
Yong Li
VGen
SyDa
501
17
0
21 Nov 2024
Autoregressive Models in Vision: A Survey
Jing Xiong
Gongye Liu
Lun Huang
Chengyue Wu
Taiqiang Wu
...
Hao Fei
Guillermo Sapiro
Jiebo Luo
Ping Luo
Ngai Wong
VGen
486
38
0
08 Nov 2024
GameGen-X: Interactive Open-world Game Video Generation
International Conference on Learning Representations (ICLR), 2024
Haoxuan Che
Xuanhua He
Quande Liu
Cheng Jin
Hao Chen
VGen
393
64
0
01 Nov 2024
SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video Generation
International Conference on Learning Representations (ICLR), 2024
Yining Hong
Beide Liu
Maxine Wu
Yuanhao Zhai
Kai-Wei Chang
...
Chung-Ching Lin
Jianfeng Wang
Zhiyong Yang
Yingnian Wu
Lijuan Wang
VGen
276
17
0
30 Oct 2024
Multi-Task Interactive Robot Fleet Learning with Visual World Models
Conference on Robot Learning (CoRL), 2024
Huihan Liu
Yu Zhang
Vaarij Betala
Evan Zhang
James Liu
Crystal Ding
Yinlin Zhu
322
18
0
30 Oct 2024
DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene Representation
Computer Vision and Pattern Recognition (CVPR), 2024
Guosheng Zhao
Chaojun Ni
Xiaofeng Wang
Zheng Zhu
Xinming Zhang
...
Xinze Chen
Boyuan Wang
Youyi Zhang
Wenjun Mei
Xingang Wang
VGen
541
80
0
17 Oct 2024
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation
Fanqing Meng
Jiaqi Liao
Xinyu Tan
Wenqi Shao
Quanfeng Lu
Kaipeng Zhang
Yu Cheng
Dianqi Li
Yu Qiao
Ping Luo
VGen
EGVM
246
66
0
07 Oct 2024
ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction
Hyungjin Chung
Dohun Lee
Jong Chul Ye
VGen
DiffM
195
2
0
07 Oct 2024
AVID: Adapting Video Diffusion Models to World Models
Marc Rigter
Tarun Gupta
Agrin Hilmkil
Chao Ma
VGen
290
18
0
01 Oct 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Yatian Wang
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
289
8
0
30 Jul 2024
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Zehua Wang
Weixing Chen
Yongjie Bai
Xiaodan Liang
Guanbin Li
Wen Gao
Liang Lin
LM&Ro
SyDa
AI4CE
617
181
0
09 Jul 2024
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations
Benno Krojer
Dheeraj Vattikonda
Luis Lara
Varun Jampani
Eva Portelance
Christopher Pal
Siva Reddy
EGVM
VGen
346
17
0
03 Jul 2024
MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos
Xuehai He
Weixi Feng
Kaizhi Zheng
Yujie Lu
Wanrong Zhu
...
Zhengyuan Yang
Kevin Lin
William Yang Wang
Lijuan Wang
Xin Eric Wang
VGen
LRM
598
33
0
12 Jun 2024
COMBO: Compositional World Models for Embodied Multi-Agent Cooperation
Hongxin Zhang
Zeyuan Wang
Qiushi Lyu
Zheyuan Zhang
Sunli Chen
Tianmin Shu
Yilun Du
Kwonjoon Lee
Yilun Du
Chuang Gan
405
32
0
16 Apr 2024
1