3D-VLA: A 3D Vision-Language-Action Generative World Model

International Conference on Machine Learning (ICML), 2024

14 March 2024

Chuang Gan

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "3D-VLA: A 3D Vision-Language-Action Generative World Model"

41 / 141 papers shown

System 0/1/2/3: Quad-process theory for multi-timescale embodied collective cognitive systems

344

08 Mar 2025

Integrating Chain-of-Thought for Multimodal Alignment: A Study on 3D Vision-Language Learning

362

08 Mar 2025

VLA Model-Expert Collaboration for Bi-directional Manipulation Learning

...

268

06 Mar 2025

Data-Efficient Multi-Agent Spatial Planning with LLMs

456

26 Feb 2025

Pre-training Auto-regressive Robotic Models with 4D Representations

417

18 Feb 2025

Understanding and Evaluating Hallucinations in 3D Visual Language Models

412

18 Feb 2025

SoFar: Language-Grounded Orientation Bridges Spatial Reasoning and Object Manipulation

...

445

18 Feb 2025

DMWM: Dual-Mind World Model with Long-Term Imagination

1.0K

11 Feb 2025

DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control

469

103

09 Feb 2025

Towards Generalist Robot Policies: What Matters in Building Vision-Language-Action Models

488

18 Dec 2024

LSceneLLM: Enhancing Large 3D Scene Understanding Using Adaptive Visual PreferencesComputer Vision and Pattern Recognition (CVPR), 2024

432

02 Dec 2024

ShowUI: One Vision-Language-Action Model for GUI Visual AgentComputer Vision and Pattern Recognition (CVPR), 2024

343

123

26 Nov 2024

Understanding World or Predicting Future? A Comprehensive Survey of World ModelsACM Computing Surveys (ACM CSUR), 2024

...

Chen Gao

Fengli Xu

Yong Li

VGen SyDa

517

21 Nov 2024

Generalist Virtual Agents: A Survey on Autonomous Agents Across Digital Platforms

312

17 Nov 2024

Few-Shot Task Learning through Inverse Generative ModelingNeural Information Processing Systems (NeurIPS), 2024

484

07 Nov 2024

VLASCD: A Visual Language Action Model for Simultaneous Chatting and Decision Making

359

21 Oct 2024

PIVOT-R: Primitive-Driven Waypoint-Aware World Model for Robotic ManipulationNeural Information Processing Systems (NeurIPS), 2024

315

14 Oct 2024

Towards Synergistic, Generalized, and Efficient Dual-System for Robotic Manipulation

Qingwen Bu

Hongyang Li

Li Chen

399

10 Oct 2024

SPARTUN3D: Situated Spatial Understanding of 3D World in Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

Yue Zhang

Zhiyang Xu

Ying Shen

Parisa Kordjamshidi

Lifu Huang

315

04 Oct 2024

Helpful DoggyBot: Open-World Object Fetching using Legged Robots and Vision-Language Models

Xiaolong Wang

270

30 Sep 2024

FoAM: Foresight-Augmented Multi-Task Imitation Policy for Robotic Manipulation

343

29 Sep 2024

ChatCam: Empowering Camera Control through Conversational AINeural Information Processing Systems (NeurIPS), 2024

Xinhang Liu

Yu-Wing Tai

Chi-Keung Tang

VGen

264

25 Sep 2024

MultiTalk: Introspective and Extrospective Dialogue for Human-Environment-LLM AlignmentIEEE International Conference on Robotics and Automation (ICRA), 2024

Venkata Naren Devarakonda

Ali Umut Kaypak

Shuaihang Yuan

Prashanth Krishnamurthy

Yi Fang

Farshad Khorrami

LLMAG

195

24 Sep 2024

Embodiment-Agnostic Action Planning via Object-Part Scene FlowIEEE International Conference on Robotics and Automation (ICRA), 2024

Wei Zhan

Yun-Hui Liu

Mingyu Ding

228

16 Sep 2024

Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene UnderstandingNeural Information Processing Systems (NeurIPS), 2024

526

05 Sep 2024

SafeEmbodAI: a Safety Framework for Mobile Robots in Embodied AI Systems

336

03 Sep 2024

Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI

Xiaodan Liang

Liang Lin

619

185

09 Jul 2024

LLaRA: Supercharging Robot Learning Data for Vision-Language Policy

...

592

28 Jun 2024

HumanVLA: Towards Vision-Language Directed Object Rearrangement by Physical Humanoid

Lei Han

261

28 Jun 2024

Mitigating the Human-Robot Domain Discrepancy in Visual Pre-training for Robotic Manipulation

430

20 Jun 2024

OpenVLA: An Open-Source Vision-Language-Action Model

...

Dorsa Sadigh

Percy Liang

Chelsea Finn

LM&Ro VLM

590

1,350

13 Jun 2024

Pandora: Towards General World Model with Natural Language Actions and Video States

Guangyi Liu

...

Zhengzhong Liu

Eric P. Xing

Zhiting Hu

VGen

302

12 Jun 2024

A Survey on Text-guided 3D Visual Grounding: Elements, Recent Advances, and Future DirectionsIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024

Wei Hu

357

09 Jun 2024

MeshXL: Neural Coordinate Field for Generative 3D Foundation Models

...

Jingyi Yu

Tao Chen

304

31 May 2024

A Survey on Vision-Language-Action Models for Embodied AI

885

166

23 May 2024

When LLMs step into the 3D World: A Survey and Meta-Analysis of 3D Tasks via Multi-modal Large Language Models

...

366

16 May 2024

What Foundation Models can Bring for Robot Learning in Manipulation : A Survey

...

900

28 Apr 2024

OAKINK2: A Dataset of Bimanual Hands-Object Manipulation in Complex Task Completion

221

28 Mar 2024

A Robotic Skill Learning System Built Upon Diffusion Policies and Foundation Models

Danica Kragic

208

25 Mar 2024

DROID: A Large-Scale In-The-Wild Robot Manipulation Dataset

...

Sergey Levine

Chelsea Finn

543

485

19 Mar 2024

An Interactive Agent Foundation Model

...

321

08 Feb 2024