3D-VLA: A 3D Vision-Language-Action Generative World Model

International Conference on Machine Learning (ICML), 2024

14 March 2024

Chuang Gan

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "3D-VLA: A 3D Vision-Language-Action Generative World Model"

50 / 141 papers shown

Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation

...

276

05 Jul 2025

Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding

211

01 Jul 2025

A Survey: Learning Embodied Intelligence from Physical Simulators and World Models

...

304

01 Jul 2025

Goal-VLA: Image-Generative VLMs as Object-Centric World Models Empowering Zero-shot Robot Manipulation

183

30 Jun 2025

4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration

...

122

27 Jun 2025

MR-COSMO: Visual-Text Memory Recall and Direct CrOSs-MOdal Alignment Method for Query-Driven 3D Segmentation

197

26 Jun 2025

CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling

...

203

24 Jun 2025

RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies

...

252

22 Jun 2025

DyNaVLM: Zero-Shot Vision-Language Navigation System with Dynamic Viewpoints and Self-Refining Graph Memory

174

18 Jun 2025

CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding

270

16 Jun 2025

AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making

323

14 Jun 2025

SAFE: Multitask Failure Detection for Vision-Language-Action Models

231

11 Jun 2025

BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models

262

09 Jun 2025

Real-Time Execution of Action Chunking Flow Policies

544

09 Jun 2025

Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics

268

29 May 2025

Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better

Danny Driess

Jost Tobias Springenberg

...

294

29 May 2025

ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning

187

25 May 2025

Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization

418

21 May 2025

SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning

525

18 May 2025

Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions

252

16 May 2025

EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation

341

15 May 2025

Depth Anything with Any Prior

264

15 May 2025

FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation

291

15 May 2025

VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation

285

14 May 2025

DataMIL: Selecting Data for Robot Imitation Learning with Datamodels

Roberto Martín-Martín

350

14 May 2025

CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding

287

13 May 2025

Training Strategies for Efficient Embodied Reasoning

432

13 May 2025

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual GroundingInternational Conference on Learning Representations (ICLR), 2025

311

08 May 2025

Vision-Language-Action Models: Concepts, Progress, Applications and Challenges

Ranjan Sapkota

Yang Cao

Konstantinos I. Roumeliotis

Manoj Karkee

LM&Ro

997

07 May 2025

Task Reconstruction and Extrapolation for

π_0

using Text Latent

Quanyi Li

648

06 May 2025

Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

...

374

04 May 2025

Robotic Visual InstructionComputer Vision and Pattern Recognition (CVPR), 2025

393

01 May 2025

Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and ScalabilityIEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2025

395

26 Apr 2025

$$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization$

π_{0.5}

: a Vision-Language-Action Model with Open-World Generalization

Physical Intelligence

...

8.1K

374

22 Apr 2025

SOPHY: Learning to Generate Simulation-Ready Objects with Physical Materials

Junyi Cao

Evangelos Kalogerakis

AI4CE

339

17 Apr 2025

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

...

625

17 Apr 2025

Diffusion Models for Robotic Manipulation: A SurveyFrontiers in Robotics and AI (Front. Robot. AI), 2025

527

11 Apr 2025

Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models

458

09 Apr 2025

The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?

...

363

06 Apr 2025

Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionInformation Fusion (Inf. Fusion), 2025

...

445

03 Apr 2025

Embodied Long Horizon Manipulation with Closed-loop Code Generation and Incremental Few-shot Adaptation

322

27 Mar 2025

CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action ModelsComputer Vision and Pattern Recognition (CVPR), 2025

...

354

201

27 Mar 2025

Boosting Robotic Manipulation Generalization with Minimal Costly Data

369

25 Mar 2025

AdaWorld: Learning Adaptable World Models with Latent Actions

558

24 Mar 2025

MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving

305

20 Mar 2025

Curiosity-Diffuser: Curiosity Guide Diffusion Models for Reliability

321

19 Mar 2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

...

556

396

18 Mar 2025

HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding

375

17 Mar 2025

Towards Fast, Memory-based and Data-Efficient Vision-Language Policy

333

13 Mar 2025

PhysVLM: Enabling Visual Language Models to Understand Robotic Physical ReachabilityComputer Vision and Pattern Recognition (CVPR), 2025

306

11 Mar 2025