ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2403.09631
  4. Cited By
3D-VLA: A 3D Vision-Language-Action Generative World Model

3D-VLA: A 3D Vision-Language-Action Generative World Model

International Conference on Machine Learning (ICML), 2024
14 March 2024
Haoyu Zhen
Xiaowen Qiu
Peihao Chen
Jincheng Yang
Xin Yan
Yilun Du
Yining Hong
Chuang Gan
    LM&RoVGenPINN
ArXiv (abs)PDFHTMLHuggingFace (10 upvotes)

Papers citing "3D-VLA: A 3D Vision-Language-Action Generative World Model"

50 / 141 papers shown
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
Move to Understand a 3D Scene: Bridging Visual Grounding and Exploration for Efficient and Versatile Embodied Navigation
Ziyu Zhu
Xilin Wang
Yixuan Li
Zhuofan Zhang
Xiaojian Ma
...
Wei Liang
Qian Yu
Zhidong Deng
Siyuan Huang
Qing Li
LM&Ro
276
23
0
05 Jul 2025
Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding
Evo-0: Vision-Language-Action Model with Implicit Spatial Understanding
Tao Lin
Gen Li
Yilei Zhong
Yanwen Zou
Yuxin Du
Jiting Liu
Encheng Gu
Bo Zhao
VLM
211
17
0
01 Jul 2025
A Survey: Learning Embodied Intelligence from Physical Simulators and World Models
A Survey: Learning Embodied Intelligence from Physical Simulators and World Models
Xiaoxiao Long
Qingrui Zhao
Kaiwen Zhang
Zihao Zhang
Dingrui Wang
...
Jia Pan
Qiu Shen
Ruigang Yang
X. Cao
Qionghai Dai
LM&RoAI4CE
304
22
0
01 Jul 2025
Goal-VLA: Image-Generative VLMs as Object-Centric World Models Empowering Zero-shot Robot Manipulation
Goal-VLA: Image-Generative VLMs as Object-Centric World Models Empowering Zero-shot Robot Manipulation
Haonan Chen
Jingxiang Guo
Bangjun Wang
Tianrui Zhang
Xuchuan Huang
Boren Zheng
Yiwen Hou
Chenrui Tie
Jiajun Deng
Lin Shao
VGenLM&RoSyDa
183
2
0
30 Jun 2025
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
4D-VLA: Spatiotemporal Vision-Language-Action Pretraining with Cross-Scene Calibration
J. Zhang
Yurui Chen
Yueming Xu
Ze Huang
Yanpeng Zhou
...
Xinyue Cai
Guowei Huang
Xingyue Quan
Hang Xu
Li Zhang
122
16
0
27 Jun 2025
MR-COSMO: Visual-Text Memory Recall and Direct CrOSs-MOdal Alignment Method for Query-Driven 3D Segmentation
MR-COSMO: Visual-Text Memory Recall and Direct CrOSs-MOdal Alignment Method for Query-Driven 3D Segmentation
Chade Li
Pengju Zhang
Yihong Wu
3DV
197
0
0
26 Jun 2025
CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
CronusVLA: Towards Efficient and Robust Manipulation via Multi-Frame Vision-Language-Action Modeling
Hao Li
Shuai Yang
Yilun Chen
Xinyi Chen
Xiaoda Yang
...
Hanqing Wang
Tai Wang
Dahua Lin
Feng Zhao
Jiangmiao Pang
203
6
0
24 Jun 2025
RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
RoboArena: Distributed Real-World Evaluation of Generalist Robot Policies
P. Atreya
Karl Pertsch
T. Lee
Moo Jin Kim
A. Jain
...
R. M. Martin
Youngwoon Lee
Percy Liang
Chelsea Finn
Sergey Levine
OffRL
252
16
0
22 Jun 2025
DyNaVLM: Zero-Shot Vision-Language Navigation System with Dynamic Viewpoints and Self-Refining Graph Memory
DyNaVLM: Zero-Shot Vision-Language Navigation System with Dynamic Viewpoints and Self-Refining Graph Memory
Zihe Ji
Huangxuan Lin
Yue Gao
VLM
174
3
0
18 Jun 2025
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
CEED-VLA: Consistency Vision-Language-Action Model with Early-Exit Decoding
Wenxuan Song
Jiayi Chen
Pengxiang Ding
Yuxin Huang
Han Zhao
Donglin Wang
Haoang Li
270
15
0
16 Jun 2025
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
AntiGrounding: Lifting Robotic Actions into VLM Representation Space for Decision Making
Wenbo Li
Shiyi Wang
Yiteng Chen
Huiping Zhuang
Qingyao Wu
323
0
0
14 Jun 2025
SAFE: Multitask Failure Detection for Vision-Language-Action Models
SAFE: Multitask Failure Detection for Vision-Language-Action Models
Qiao Gu
Yuanliang Ju
Shengxiang Sun
Igor Gilitschenski
Haruki Nishimura
Masha Itkina
Florian Shkurti
231
15
0
11 Jun 2025
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models
BridgeVLA: Input-Output Alignment for Efficient 3D Manipulation Learning with Vision-Language Models
Peiyan Li
Yixiang Chen
Hongtao Wu
Xiao Ma
Xiangnan Wu
Y. Huang
Liang Wang
Tao Kong
Tieniu Tan
262
27
0
09 Jun 2025
Real-Time Execution of Action Chunking Flow Policies
Real-Time Execution of Action Chunking Flow Policies
Kevin Black
Manuel Y. Galliker
Sergey Levine
OffRL
544
37
0
09 Jun 2025
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Robot-R1: Reinforcement Learning for Enhanced Embodied Reasoning in Robotics
Dongyoung Kim
S. Park
Huiwon Jang
Jinwoo Shin
Jaehyung Kim
Younggyo Seo
LRM
268
9
0
29 May 2025
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Knowledge Insulating Vision-Language-Action Models: Train Fast, Run Fast, Generalize Better
Danny Driess
Jost Tobias Springenberg
Brian Ichter
Lili Yu
Adrian Li-Bell
...
Allen Z. Ren
Homer Walke
Quan Vuong
Lucy Xiaoyang Shi
Sergey Levine
294
46
0
29 May 2025
ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning
ReFineVLA: Reasoning-Aware Teacher-Guided Transfer Fine-Tuning
Tuan V. Vo
T. Nguyen
Khang Nguyen
Duy Ho Minh Nguyen
Minh Nhat Vu
LRM
187
4
0
25 May 2025
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Exploring the Limits of Vision-Language-Action Manipulations in Cross-task Generalization
Jiaming Zhou
Ke Ye
Jiayi Liu
Teli Ma
Zifang Wang
Ronghe Qiu
Kun-Yu Lin
Zhilin Zhao
Junwei Liang
418
16
0
21 May 2025
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
SSR: Enhancing Depth Perception in Vision-Language Models via Rationale-Guided Spatial Reasoning
Yang Liu
Ming Ma
Xiaomin Yu
Pengxiang Ding
Han Zhao
Mingyang Sun
Siteng Huang
Xuetao Zhang
LRM
525
18
0
18 May 2025
Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
Unveiling the Potential of Vision-Language-Action Models with Open-Ended Multimodal Instructions
Wei Zhao
Gongsheng Li
Zhefei Gong
Pengxiang Ding
Han Zhao
Donglin Wang
LM&Ro
252
9
0
16 May 2025
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
Zibin Dong
Fei Ni
Yifu Yuan
Yinchuan Li
Jianye Hao
341
3
0
15 May 2025
Depth Anything with Any Prior
Depth Anything with Any Prior
Zehan Wang
Siyu Chen
Lihe Yang
Jialei Wang
Ziang Zhang
Hengshuang Zhao
Zhou Zhao
3DGSVLMMDE
264
6
0
15 May 2025
FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation
FlowDreamer: A RGB-D World Model with Flow-based Motion Representations for Robot Manipulation
Jun Guo
Xiaojian Ma
Yikai Wang
Min Yang
Huaping Liu
Qing Li
VGen
291
8
0
15 May 2025
VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation
VTLA: Vision-Tactile-Language-Action Model with Preference Learning for Insertion Manipulation
Chaofan Zhang
Peng Hao
Xiaoge Cao
Xiaoshuai Hao
Shaowei Cui
Shuo Wang
285
23
0
14 May 2025
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
Shivin Dass
Alaa Khaddaj
Logan Engstrom
Aleksander Madry
Andrew Ilyas
Roberto Martín-Martín
350
8
0
14 May 2025
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
CLTP: Contrastive Language-Tactile Pre-training for 3D Contact Geometry Understanding
Wenxuan Ma
Xiaoge Cao
Yujiao Shi
Chaofan Zhang
Shaobo Yang
Peng Hao
Bin Fang
Yinghao Cai
Shaowei Cui
Shuo Wang
287
3
0
13 May 2025
Training Strategies for Efficient Embodied Reasoning
Training Strategies for Efficient Embodied Reasoning
William Chen
Suneel Belkhale
Suvir Mirchandani
Oier Mees
Danny Driess
Karl Pertsch
Sergey Levine
OffRLLRM
432
26
0
13 May 2025
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual Grounding
DenseGrounding: Improving Dense Language-Vision Semantics for Ego-Centric 3D Visual GroundingInternational Conference on Learning Representations (ICLR), 2025
Henry Zheng
Hao Shi
Qihang Peng
Yong Xien Chng
Rui Huang
Yepeng Weng
Peng Wang
Gao Huang
311
8
0
08 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I. Roumeliotis
Manoj Karkee
LM&Ro
997
42
0
07 May 2025
Task Reconstruction and Extrapolation for $π_0$ using Text Latent
Task Reconstruction and Extrapolation for π0π_0π0​ using Text Latent
Quanyi Li
648
2
0
06 May 2025
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Interleave-VLA: Enhancing Robot Manipulation with Interleaved Image-Text Instructions
Cunxin Fan
Xiaosong Jia
Yihang Sun
Yixiao Wang
Jianglan Wei
...
Xiangyu Zhao
Masayoshi Tomizuka
Songyuan Li
Junchi Yan
Mingyu Ding
LM&RoVLM
374
25
0
04 May 2025
Robotic Visual Instruction
Robotic Visual InstructionComputer Vision and Pattern Recognition (CVPR), 2025
Yuchen Ren
Ziyang Gong
Haoyang Li
Xiaoqi Huang
Haolan Kang
Guangping Bai
Xianzheng Ma
LM&Ro
393
9
0
01 May 2025
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and Scalability
Generative AI in Embodied Systems: System-Level Analysis of Performance, Efficiency and ScalabilityIEEE International Symposium on Performance Analysis of Systems and Software (ISPASS), 2025
Zishen Wan
Jiayi Qian
Yuhang Du
Jason J. Jabbour
Yilun Du
Yang Katie Zhao
A. Raychowdhury
Tushar Krishna
Vijay Janapa Reddi
LM&Ro
395
2
0
26 Apr 2025
$π_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π0.5π_{0.5}π0.5​: a Vision-Language-Action Model with Open-World Generalization
Physical Intelligence
Kevin Black
Noah Brown
James Darpinian
Karan Dhabalia
...
Homer Walke
Anna Walling
Haohuan Wang
Lili Yu
Ury Zhilinsky
LM&RoVLM
8.1K
374
0
22 Apr 2025
SOPHY: Learning to Generate Simulation-Ready Objects with Physical Materials
SOPHY: Learning to Generate Simulation-Ready Objects with Physical Materials
Junyi Cao
Evangelos Kalogerakis
AI4CE
339
0
0
17 Apr 2025
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
A0: An Affordance-Aware Hierarchical Model for General Robotic Manipulation
Rongtao Xu
Junxuan Zhang
Minghao Guo
Youpeng Wen
H. Yang
...
Liqiong Wang
Yuxuan Kuang
Meng Cao
Feng Zheng
Xiaodan Liang
625
31
0
17 Apr 2025
Diffusion Models for Robotic Manipulation: A Survey
Diffusion Models for Robotic Manipulation: A SurveyFrontiers in Robotics and AI (Front. Robot. AI), 2025
Rosa Wolf
Yitian Shi
Sheng Liu
Rania Rayyes
527
28
0
11 Apr 2025
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Decoupling Contrastive Decoding: Robust Hallucination Mitigation in Multimodal Large Language Models
Wei Chen
Xin Yan
Bin Wen
Fan Yang
Yan Li
Di Zhang
Long Chen
MLLM
458
0
0
09 Apr 2025
The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
The Point, the Vision and the Text: Does Point Cloud Boost Spatial Reasoning of Large Language Models?
Weichen Zhang
Ruiying Peng
Chen Gao
Jianjie Fang
Xin Zeng
...
Liang Luo
Jinqiang Cui
Xin Wang
Xinlei Chen
Yongqian Li
LRM
363
4
0
06 Apr 2025
Multimodal Fusion and Vision-Language Models: A Survey for Robot Vision
Multimodal Fusion and Vision-Language Models: A Survey for Robot VisionInformation Fusion (Inf. Fusion), 2025
Xiaofeng Han
Shunpeng Chen
Zenghuang Fu
Zhe Feng
Lue Fan
...
Li Guo
Weiliang Meng
Xiaopeng Zhang
Rongtao Xu
Shibiao Xu
445
40
0
03 Apr 2025
Embodied Long Horizon Manipulation with Closed-loop Code Generation and Incremental Few-shot Adaptation
Embodied Long Horizon Manipulation with Closed-loop Code Generation and Incremental Few-shot Adaptation
Y. Meng
Xiangtong Yao
Haihui Ye
Yirui Zhou
Shengqiang Zhang
Zhenguo Sun
Alois C. Knoll
Zhenshan Bing
Alois Knoll
LM&RoVLM
322
2
0
27 Mar 2025
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action Models
CoT-VLA: Visual Chain-of-Thought Reasoning for Vision-Language-Action ModelsComputer Vision and Pattern Recognition (CVPR), 2025
Qingqing Zhao
Yao Lu
Moo Jin Kim
Zipeng Fu
Zhuoyang Zhang
...
Ankur Handa
Xuan Li
Donglai Xiang
Gordon Wetzstein
Nayeon Lee
LM&RoLRM
354
201
0
27 Mar 2025
Boosting Robotic Manipulation Generalization with Minimal Costly Data
Boosting Robotic Manipulation Generalization with Minimal Costly Data
Liming Zheng
Feng Yan
Fanfan Liu
C. Feng
Yufeng Zhong
Yiyang Huang
369
2
0
25 Mar 2025
AdaWorld: Learning Adaptable World Models with Latent Actions
AdaWorld: Learning Adaptable World Models with Latent Actions
Shenyuan Gao
Siyuan Zhou
Yilun Du
Jun Zhang
Chuang Gan
VGen
558
35
0
24 Mar 2025
MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving
MiLA: Multi-view Intensive-fidelity Long-term Video Generation World Model for Autonomous Driving
Haiguang Wang
Daqi Liu
Hongwei Xie
Haisong Liu
Enhui Ma
Kaicheng Yu
Limin Wang
Bing Wang
VGen
305
5
0
20 Mar 2025
Curiosity-Diffuser: Curiosity Guide Diffusion Models for Reliability
Curiosity-Diffuser: Curiosity Guide Diffusion Models for Reliability
Zihao Liu
Xing Liu
Yizhai Zhang
Zhengxiong Liu
Panfeng Huang
321
0
0
19 Mar 2025
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
GR00T N1: An Open Foundation Model for Generalist Humanoid Robots
Nvidia
Johan Bjorck
Fernando Castañeda
Nikita Cherniadev
Xingye Da
...
Ao Zhang
Hao Zhang
Yizhou Zhao
Ruijie Zheng
Yuke Zhu
VLM
556
396
0
18 Mar 2025
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
HIS-GPT: Towards 3D Human-In-Scene Multimodal Understanding
Jiahe Zhao
Ruibing Hou
Zejie Tian
Hong Chang
Shiguang Shan
375
0
0
17 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Yongqian Li
Xinggang Wang
LM&Ro
333
2
0
13 Mar 2025
PhysVLM: Enabling Visual Language Models to Understand Robotic Physical ReachabilityComputer Vision and Pattern Recognition (CVPR), 2025
Weijie Zhou
Manli Tao
Honghui Dong
Haiyun Guo
Honghui Dong
Ming Tang
Jinqiao Wang
306
16
0
11 Mar 2025
Previous
123
Next
Page 2 of 3