Pandora: Towards General World Model with Natural Language Actions and Video States

12 June 2024

Guangyi Liu

Zhengzhong Liu

Eric P. Xing

Zhiting Hu

VGen

ArXiv (abs)PDF HTML HuggingFace (15 upvotes)

Papers citing "Pandora: Towards General World Model with Natural Language Actions and Video States"

47 / 47 papers shown

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

124

01 Dec 2025

SimWorld: An Open-ended Realistic Simulator for Autonomous Agents in Physical and Social Worlds

...

232

30 Nov 2025

In-Video Instructions: Visual Signals as Generative Control

24 Nov 2025

Counterfactual World Models via Digital Twin-conditioned Video Diffusion

165

21 Nov 2025

Towards High-Consistency Embodied World Model with Multi-View Trajectory Videos

251

17 Nov 2025

Simulating the Visual World with Artificial Intelligence: A Roadmap

460

11 Nov 2025

A Step Toward World Models: A Survey on Robotic Manipulation

742

31 Oct 2025

A Comprehensive Survey on World Models for Embodied AI

248

19 Oct 2025

Terra: Explorable Native 3D World Model with Point Latents

122

16 Oct 2025

MorphoSim: An Interactive, Controllable, and Editable Language-guided 4D World Simulator

Xuehai He

Shijie Zhou

Thivyanth Venkateswaran

160

05 Oct 2025

VIVA+: Human-Centered Situational Decision-Making

115

28 Sep 2025

Learning Primitive Embodied World Models: Towards Scalable Robotic Learning

...

405

28 Aug 2025

Critiques of World Models

218

07 Jul 2025

GenWorld: Towards Detecting AI-generated Real-world Simulation Videos

317

12 Jun 2025

Long-Context State-Space Video World Models

312

26 May 2025

DreamGen: Unlocking Generalization in Robot Learning through Video World Models

...

386

19 May 2025

AnimeGamer: Infinite Anime Life Simulation with Next Game State Prediction

412

01 Apr 2025

WorldScore: A Unified Evaluation Benchmark for World Generation

393

01 Apr 2025

VLIPP: Towards Physically Plausible Video Generation with Vision and Language Informed Physical Prior

...

492

30 Mar 2025

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action ModelsComputer Vision and Pattern Recognition (CVPR), 2025

...

335

198

27 Mar 2025

AdaWorld: Learning Adaptable World Models with Latent Actions

554

24 Mar 2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

...

545

381

18 Mar 2025

WISA: World Simulator Assistant for Physics-Aware Text-to-Video Generation

...

328

11 Mar 2025

WorldModelBench: Judging Video Generation Models As World Models

...

237

28 Feb 2025

Learning Human Skill Generators at Key-Step Levels

390

12 Feb 2025

DMWM: Dual-Mind World Model with Long-Term Imagination

1.0K

11 Feb 2025

Efficient-vDiT: Efficient Video Diffusion Transformers With Attention Tile

377

10 Feb 2025

Pre-Trained Video Generative Models as World Simulators

372

10 Feb 2025

DrivingGPT: Unifying Driving World Modeling and Planning with Multi-modal Autoregressive Transformers

Yuntao Chen

Yuqi Wang

Rundong Wang

1.0K

24 Dec 2024

Is Your World Simulator a Good Story Presenter? A Consecutive Events-Based Benchmark for Future Long Video GenerationComputer Vision and Pattern Recognition (CVPR), 2024

330

17 Dec 2024

GEM: A Generalizable Ego-Vision Multimodal World Model for Fine-Grained Ego-Motion, Object Dynamics, and Scene Composition ControlComputer Vision and Pattern Recognition (CVPR), 2024

...

322

15 Dec 2024

The Matrix: Infinite-Horizon World Generation with Real-Time Moving Control

286

04 Dec 2024

ReconDreamer: Crafting World Models for Driving Scene Reconstruction via Online RestorationComputer Vision and Pattern Recognition (CVPR), 2024

...

822

29 Nov 2024

Understanding World or Predicting Future? A Comprehensive Survey of World ModelsACM Computing Surveys (ACM CSUR), 2024

...

Chen Gao

Fengli Xu

Yong Li

VGen SyDa

517

21 Nov 2024

Autoregressive Models in Vision: A Survey

...

486

08 Nov 2024

GameGen-X: Interactive Open-world Game Video GenerationInternational Conference on Learning Representations (ICLR), 2024

393

01 Nov 2024

SlowFast-VGen: Slow-Fast Learning for Action-Driven Long Video GenerationInternational Conference on Learning Representations (ICLR), 2024

...

284

30 Oct 2024

Multi-Task Interactive Robot Fleet Learning with Visual World ModelsConference on Robot Learning (CoRL), 2024

322

30 Oct 2024

DriveDreamer4D: World Models Are Effective Data Machines for 4D Driving Scene RepresentationComputer Vision and Pattern Recognition (CVPR), 2024

...

549

17 Oct 2024

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

Kaipeng Zhang

Yu Cheng

Dianqi Li

Yu Qiao

Ping Luo

VGen EGVM

254

07 Oct 2024

ACDC: Autoregressive Coherent Multimodal Generation using Diffusion Correction

Hyungjin Chung

Dohun Lee

Jong Chul Ye

VGen DiffM

195

07 Oct 2024

AVID: Adapting Video Diffusion Models to World Models

291

01 Oct 2024

MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions

...

Wenhan Luo

Qifeng Chen

Shanghang Zhang

Qi-fei Liu

Yi-Ting Guo

293

30 Jul 2024

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Xiaodan Liang

Liang Lin

617

185

09 Jul 2024

Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

346

03 Jul 2024

MMWorld: Towards Multi-discipline Multi-faceted World Model Evaluation in Videos

...

Zhengyuan Yang

Kevin Lin

William Yang Wang

Lijuan Wang

Xin Eric Wang

VGen LRM

613

12 Jun 2024

COMBO: Compositional World Models for Embodied Multi-Agent Cooperation

Kwonjoon Lee

Yilun Du

Chuang Gan

409

16 Apr 2024