v1v2 (latest)

OpenVLA: An Open-Source Vision-Language-Action Model

13 June 2024

Quan Vuong

Dorsa Sadigh

Percy Liang

Chelsea Finn

LM&Ro

VLM

ArXiv (abs)PDF HTML HuggingFace (40 upvotes)

Papers citing "OpenVLA: An Open-Source Vision-Language-Action Model"

50 / 723 papers shown

DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual GroundingInternational Conference on Learning Representations (ICLR), 2025

311

08 May 2025

SITE: towards Spatial Intelligence Thorough Evaluation

293

08 May 2025

RobotxR1: Enabling Embodied Robotic Intelligence on Large Language Models through Closed-Loop Reinforcement Learning

Liam Boyle

Nicolas Baumann

Paviththiren Sivasothilingam

Michele Magno

Luca Benini

LM&Ro LRM

447

06 May 2025

RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration

...

412

06 May 2025

Task Reconstruction and Extrapolation for

π_0

using Text Latent

Quanyi Li

653

06 May 2025

Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions

...

382

04 May 2025

CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation

...

276

04 May 2025

ReLI: A Language-Agnostic Approach to Human-Robot Interaction

556

03 May 2025

J-PARSE: Jacobian-based Projection Algorithm for Resolving Singularities Effectively in Inverse Kinematic Control of Serial Manipulators

382

01 May 2025

Anyprefer: An Agentic Framework for Preference Data SynthesisInternational Conference on Learning Representations (ICLR), 2025

...

445

27 Apr 2025

RoboVerse: Towards a Unified Platform, Dataset and Benchmark for Scalable and Generalizable Robot Learning

...

434

26 Apr 2025

STDArm: Transferring Visuomotor Policies From Static Data Training to Dynamic Robot Manipulation

317

26 Apr 2025

Instrumentation for Better Demonstrations: A Case Study

Remko Proesmans

Thomas Lips

Francis Wyffels

293

25 Apr 2025

Few-Shot Vision-Language Action-Incremental Policy Learning

266

22 Apr 2025

Phoenix: A Motion-based Self-Reflection Framework for Fine-grained Robotic Action CorrectionComputer Vision and Pattern Recognition (CVPR), 2025

297

20 Apr 2025

Latent Representations for Visual Proprioception in Inexpensive Robots

Sahara Sheikholeslami

Ladislau Bölöni

423

20 Apr 2025

Manipulating Multimodal Agents via Cross-Modal Prompt Injection

800

19 Apr 2025

Adversarial Locomotion and Motion Imitation for Humanoid Policy Learning

445

19 Apr 2025

Crossing the Human-Robot Embodiment Gap with Sim-to-Real RL using One Human Demonstration

414

17 Apr 2025

A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation

...

632

17 Apr 2025

Towards Forceful Robotic Foundation Models: a Literature Survey

William Xie

N. Correll

OffRL

329

16 Apr 2025

Joint Action Language Modelling for Transparent Policy Execution

248

14 Apr 2025

Diffusion Models for Robotic Manipulation: A SurveyFrontiers in Robotics and AI (Front. Robot. AI), 2025

545

11 Apr 2025

Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionInformation Fusion (Inf. Fusion), 2025

...

445

03 Apr 2025

Unified World Models: Coupling Video and Action Diffusion for Pretraining on Large Robotic Datasets

505

03 Apr 2025

Grounding Multimodal LLMs to Embodied Agents that Ask for Help with Reinforcement Learning

438

01 Apr 2025

Intrinsically-Motivated Humans and Agents in Open-World Exploration

391

31 Mar 2025

Sim-and-Real Co-Training: A Simple Recipe for Vision-Based Robotic Manipulation

Abhiram Maddukuri

Z. L. Jiang

Lawrence Yunliang Chen

...

383

31 Mar 2025

ZeroMimic: Distilling Robotic Manipulation Skills from Web VideosIEEE International Conference on Robotics and Automation (ICRA), 2025

296

31 Mar 2025

OpenDriveVLA: Towards End-to-end Autonomous Driving with Large Vision Language Action Model

440

30 Mar 2025

Empirical Analysis of Sim-and-Real Cotraining of Diffusion Policies for Planar Pushing from Pixels

279

28 Mar 2025

Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

...

537

27 Mar 2025

Boosting Robotic Manipulation Generalization with Minimal Costly Data

369

25 Mar 2025

Dita: Scaling Diffusion Transformer for Generalist Vision-Language-Action Policy

...

408

25 Mar 2025

LEGO-Puzzles: How Good Are MLLMs at Multi-Step Spatial Reasoning?

447

25 Mar 2025

Efficient Continual Adaptation of Pretrained Robotic Policy with Online Meta-Learned Adapters

349

24 Mar 2025

RoboEngine: Plug-and-Play Robot Data Augmentation with Semantic Robot Segmentation and Background Generation

336

24 Mar 2025

AdaWorld: Learning Adaptable World Models with Latent Actions

574

24 Mar 2025

SG-Tailor: Inter-Object Commonsense Relationship Reasoning for Scene Graph Manipulation

310

23 Mar 2025

GR00T N1: An Open Foundation Model for Generalist Humanoid Robots

...

559

413

18 Mar 2025

Cosmos-Reason1: From Physical Common Sense To Embodied Reasoning

Nvidia

A. Azzolini

Junjie Bai

Prithvijit Chattopadhyay

...

635

18 Mar 2025

Can Large Vision Language Models Read Maps Like a Human?

391

18 Mar 2025

Growing a Twig to Accelerate Large Vision-Language Models

369

18 Mar 2025

Being-0: A Humanoid Robotic Agent with Vision-Language Models and Modular Skills

487

16 Mar 2025

ReBot: Scaling Robot Learning with Real-to-Sim-to-Real Robotic Video Synthesis

268

15 Mar 2025

LIAM: Multimodal Transformer for Language Instructions, Images, Actions and Semantic Maps

210

15 Mar 2025

Adversarial Data Collection: Human-Collaborative Perturbations for Efficient and Robust Robotic Imitation Learning

368

14 Mar 2025

EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks

...

288

14 Mar 2025

Is Your Imitation Learning Policy Better than Mine? Policy Comparison with Near-Optimal Stopping

496

14 Mar 2025

Spatial-Temporal Graph Diffusion Policy with Kinematic Modeling for Bimanual Robotic ManipulationComputer Vision and Pattern Recognition (CVPR), 2025

336

13 Mar 2025