3D-VLA: A 3D Vision-Language-Action Generative World Model

International Conference on Machine Learning (ICML), 2024

14 March 2024

Chuang Gan

ArXiv (abs)PDF HTML HuggingFace (10 upvotes)

Papers citing "3D-VLA: A 3D Vision-Language-Action Generative World Model"

50 / 141 papers shown

VLA Models Are More Generalizable Than You Think: Revisiting Physical and Spatial Modeling

194

02 Dec 2025

LISA-3D: Lifting Language-Image Segmentation to 3D via Multi-View Consistency

30 Nov 2025

SwiftVLA: Unlocking Spatiotemporal Dynamics for Lightweight VLA Models at Minimal Overhead

...

105

30 Nov 2025

IndustryNav: Exploring Spatial Reasoning of Embodied Agents in Dynamic Industrial Navigation

...

156

21 Nov 2025

RynnVLA-002: A Unified Vision-Language-Action and World Model

...

317

21 Nov 2025

VLA-4D: Embedding 4D Awareness into Vision-Language-Action Models for SpatioTemporally Coherent Robotic Manipulation

Hanyu Zhou

Chuanhao Ma

Gim Hee Lee

191

21 Nov 2025

BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections

160

16 Nov 2025

A Step Toward World Models: A Survey on Robotic Manipulation

745

31 Oct 2025

Multimodal Spatial Reasoning in the Large Model Era: A Survey and Benchmarks

...

712

29 Oct 2025

From Spatial to Actions: Grounding Vision-Language-Action Model in Spatial Foundation Priors

...

105

20 Oct 2025

QDepth-VLA: Quantized Depth Prediction as Auxiliary Supervision for Vision-Language-Action Models

Zhengtao Zhang

Dongbin Zhao

VLM

114

16 Oct 2025

DepthVLA: Enhancing Vision-Language-Action Models with Depth-Aware Spatial Reasoning

121

15 Oct 2025

HiMaCon: Discovering Hierarchical Manipulation Concepts from Unlabeled Multi-Modal Data

398

13 Oct 2025

X-VLA: Soft-Prompted Transformer as Scalable Cross-Embodiment Vision-Language-Action Model

...

232

11 Oct 2025

VITA-VLA: Efficiently Teaching Vision-Language Models to Act via Action Expert Distillation

...

164

10 Oct 2025

Vision-Language-Action Models for Robotics: A Review Towards Real-World ApplicationsIEEE Access (IEEE Access), 2025

261

08 Oct 2025

Avi: Action from Volumetric Inference

Harris Song

Long Le

VGen LM&Ro

116

07 Oct 2025

NoTVLA: Narrowing of Dense Action Trajectories for Generalizable Robot Manipulation

...

105

04 Oct 2025

MLA: A Multisensory Language-Action Model for Multimodal Understanding and Forecasting in Robotic Manipulation

...

124

30 Sep 2025

dVLA: Diffusion Vision-Language-Action Model with Multimodal Chain-of-Thought

124

30 Sep 2025

Transferring Vision-Language-Action Models to Industry Applications: Architectures, Performance, and Challenges

117

27 Sep 2025

MoWM: Mixture-of-World-Models for Embodied Planning via Latent-to-Pixel Feature Modulation

171

26 Sep 2025

Pixel Motion Diffusion is What We Need for Robot Control

140

26 Sep 2025

Generalist Robot Manipulation beyond Action Labeled Data

124

24 Sep 2025

Pure Vision Language Action (VLA) Models: A Comprehensive Survey

295

23 Sep 2025

VLA-LPAF: Lightweight Perspective-Adaptive Fusion for Vision-Language-Action to Enable More Unconstrained Robotic Manipulation

124

18 Sep 2025

CRAFT: Coaching Reinforcement Learning Autonomously using Foundation Models for Multi-Robot Coordination Tasks

176

17 Sep 2025

Maps for Autonomous Driving: Full-process Survey and Frontiers

136

16 Sep 2025

Igniting VLMs toward the Embodied Space

...

195

15 Sep 2025

RoboChemist: Long-Horizon and Safety-Compliant Robotic Chemical Experimentation

148

10 Sep 2025

RoboMatch: A Unified Mobile-Manipulation Teleoperation Platform with Auto-Matching Network Architecture for Long-Horizon Tasks

...

148

10 Sep 2025

U-ARM : Ultra low-cost general teleoperation interface for robot manipulation

200

02 Sep 2025

Planning with Reasoning using Vision Language World Model

262

02 Sep 2025

Manipulation as in Simulation: Enabling Accurate Geometry Perception in Robots

...

129

02 Sep 2025

Robotic Manipulation via Imitation Learning: Taxonomy, Evolution, Benchmark, and Challenges

255

24 Aug 2025

Spatial Policy: Guiding Visuomotor Robotic Manipulation with Spatial-Aware Modeling and Reasoning

...

21 Aug 2025

Survey of Vision-Language-Action Models for Embodied Manipulation

466

21 Aug 2025

Grounding Actions in Camera Space: Observation-Centric Vision-Language-Action Policy

137

18 Aug 2025

Large VLM-based Vision-Language-Action Models for Robotic Manipulation: A Survey

247

18 Aug 2025

OVSegDT: Segmenting Transformer for Open-Vocabulary Object Goal Navigation

103

15 Aug 2025

ReconVLA: Reconstructive Vision-Language-Action Model as Effective Robot Perceiver

136

14 Aug 2025

Large Model Empowered Embodied AI: A Survey on Decision-Making and Embodied Learning

169

14 Aug 2025

OmniVTLA: Vision-Tactile-Language-Action Model with Semantic-Aligned Tactile Sensing

163

12 Aug 2025

GeoVLA: Empowering 3D Representations in Vision-Language-Action Models

146

12 Aug 2025

Perceiving and Acting in First-Person: A Dataset and Benchmark for Egocentric Human-Object-Human Interactions

...

245

06 Aug 2025

ActionSink: Toward Precise Robot Manipulation with Dynamic Integration of Action Flow

153

05 Aug 2025

H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

196

31 Jul 2025

Exploring the Link Between Bayesian Inference and Embodied Intelligence: Toward Open Physical-World Embodied AI Systems

Bin Liu

229

29 Jul 2025

Reconstructing 4D Spatial Intelligence: A Survey

...

349

28 Jul 2025

DreamVLA: A Vision-Language-Action Model Dreamed with Comprehensive World Knowledge

...

218

06 Jul 2025