ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.14093
  4. Cited By
A Survey on Vision-Language-Action Models for Embodied AI

A Survey on Vision-Language-Action Models for Embodied AI

23 May 2024
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
    LM&Ro
ArXivPDFHTML

Papers citing "A Survey on Vision-Language-Action Models for Embodied AI"

50 / 64 papers shown
Title
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Benchmarking Vision, Language, & Action Models in Procedurally Generated, Open Ended Action Environments
Pranav Guruprasad
Yangyue Wang
Sudipta Chowdhury
Harshvardhan Sikka
LM&Ro
VLM
45
0
0
08 May 2025
Multi-agent Embodied AI: Advances and Future Directions
Multi-agent Embodied AI: Advances and Future Directions
Zhaohan Feng
Ruiqi Xue
Lei Yuan
Yang Yu
Ning Ding
M. Liu
Bingzhao Gao
Jian-jun Sun
Gang Wang
AI4CE
38
0
0
08 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I Roumeliotis
Manoj Karkee
LM&Ro
63
0
0
07 May 2025
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
A Survey of Robotic Navigation and Manipulation with Physics Simulators in the Era of Embodied AI
Lik Hang Kenny Wong
Xueyang Kang
Kaixin Bai
Jianwei Zhang
43
0
0
01 May 2025
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Chia-Yu Hung
Qi Sun
Pengfei Hong
Amir Zadeh
Chuan Li
U-Xuan Tan
Navonil Majumder
Soujanya Poria
LM&Ro
37
1
0
28 Apr 2025
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
Yuting Huang
Leilei Ding
Zhipeng Tang
Tianfu Wang
Xinrui Lin
W. Zhang
Mingxiao Ma
Yanyong Zhang
LLMAG
25
0
0
20 Apr 2025
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions
Emre Can Acikgoz
Cheng Qian
Hongru Wang
Vardhan Dongre
X. Chen
Heng Ji
Dilek Hakkani-Tür
Gökhan Tür
LM&Ro
ELM
41
1
0
07 Apr 2025
RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics
RoboAct-CLIP: Video-Driven Pre-training of Atomic Action Understanding for Robotics
Zhiyuan Zhang
Yuxin He
Yong Sun
Junyu Shi
Lijiang Liu
Qiang Nie
VLM
44
0
0
02 Apr 2025
HACTS: a Human-As-Copilot Teleoperation System for Robot Learning
HACTS: a Human-As-Copilot Teleoperation System for Robot Learning
Z. Xu
Yinuo Zhao
Kun Wu
Ning Liu
Junjie Ji
Zhengping Che
C. Liu
Jian Tang
37
0
0
31 Mar 2025
Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
Physically Ground Commonsense Knowledge for Articulated Object Manipulation with Analytic Concepts
Jianhua Sun
Jiude Wei
Y. Li
Cewu Lu
LM&Ro
49
1
0
30 Mar 2025
Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop Feedback
Data-Agnostic Robotic Long-Horizon Manipulation with Vision-Language-Guided Closed-Loop Feedback
Y. Meng
Xiangtong Yao
Haihui Ye
Yirui Zhou
Shengqiang Zhang
Zhenshan Bing
Alois C. Knoll
LM&Ro
VLM
45
0
0
27 Mar 2025
When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making
When Words Outperform Vision: VLMs Can Self-Improve Via Text-Only Training For Human-Centered Decision Making
Zhe Hu
Jing Li
Yu Yin
VLM
53
0
0
21 Mar 2025
Aligning Multimodal LLM with Human Preference: A Survey
Aligning Multimodal LLM with Human Preference: A Survey
Tao Yu
Y. Zhang
Chaoyou Fu
Junkang Wu
Jinda Lu
...
Qingsong Wen
Z. Zhang
Yan Huang
Liang Wang
T. Tan
54
2
0
18 Mar 2025
VLA Model-Expert Collaboration for Bi-directional Manipulation Learning
Tian-Yu Xiang
Ao-Qun Jin
Xiao-Hu Zhou
Mei-Jiang Gui
Xiao-Liang Xie
...
Shuang-Yi Wang
Sheng-Bin Duang
Si-Cheng Wang
Zheng Lei
Z. Hou
48
1
0
06 Mar 2025
SafeVLA: Towards Safety Alignment of Vision-Language-Action Model via Safe Reinforcement Learning
Borong Zhang
Yuhao Zhang
Jiaming Ji
Yingshan Lei
Josef Dai
Yuanpei Chen
Yaodong Yang
55
3
0
05 Mar 2025
RoboDexVLM: Visual Language Model-Enabled Task Planning and Motion Control for Dexterous Robot Manipulation
Haichao Liu
Sikai Guo
Pengfei Mai
Jiahang Cao
Haoang Li
Jun Ma
34
0
0
03 Mar 2025
Action Tokenizer Matters in In-Context Imitation Learning
An Vuong
M. Vu
Dong An
Ian Reid
48
0
0
03 Mar 2025
From underwater to aerial: a novel multi-scale knowledge distillation approach for coral reef monitoring
From underwater to aerial: a novel multi-scale knowledge distillation approach for coral reef monitoring
Matteo Contini
Victor Illien
Julien Barde
Sylvain Poulain
Serge Bernard
Alexis Joly
Sylvain Bonhommeau
60
0
0
25 Feb 2025
Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review
Generative Multi-Agent Collaboration in Embodied AI: A Systematic Review
Di Wu
Xian Wei
Guang Chen
Hao Shen
Xiangfeng Wang
Wenhao Li
Bo Jin
33
2
0
17 Feb 2025
Visual Large Language Models for Generalized and Specialized Applications
Yifan Li
Zhixin Lai
Wentao Bao
Zhen Tan
Anh Dao
Kewei Sui
Jiayi Shen
Dong Liu
Huan Liu
Yu Kong
VLM
83
10
0
06 Jan 2025
Vision-Language-Action Model and Diffusion Policy Switching Enables
  Dexterous Control of an Anthropomorphic Hand
Vision-Language-Action Model and Diffusion Policy Switching Enables Dexterous Control of an Anthropomorphic Hand
Cheng Pan
Kai Junge
Josie Hughes
36
1
0
17 Oct 2024
LADEV: A Language-Driven Testing and Evaluation Platform for
  Vision-Language-Action Models in Robotic Manipulation
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Zhijie Wang
Zhehua Zhou
Jiayang Song
Yuheng Huang
Zhan Shu
Lei Ma
16
0
0
07 Oct 2024
Active Fine-Tuning of Generalist Policies
Active Fine-Tuning of Generalist Policies
Marco Bagatella
Jonas Hübotter
Georg Martius
Andreas Krause
24
0
0
07 Oct 2024
Navigation with VLM framework: Go to Any Language
Navigation with VLM framework: Go to Any Language
Zecheng Yin
Chonghao Cheng
Lizhen
LM&Ro
27
0
0
18 Sep 2024
Logically Constrained Robotics Transformers for Enhanced
  Perception-Action Planning
Logically Constrained Robotics Transformers for Enhanced Perception-Action Planning
Parv Kapoor
Sai H. Vemprala
Ashish Kapoor
21
1
0
09 Aug 2024
GP-VLS: A general-purpose vision language model for surgery
GP-VLS: A general-purpose vision language model for surgery
Samuel Schmidgall
Joseph Cho
C. Zakka
W. Hiesinger
LM&MA
36
3
0
27 Jul 2024
Knowledge Overshadowing Causes Amalgamated Hallucination in Large
  Language Models
Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models
Yuji Zhang
Sha Li
Jiateng Liu
Pengfei Yu
Yi Ren Fung
Jing Li
Manling Li
Heng Ji
21
8
0
10 Jul 2024
Aligning Cyber Space with Physical World: A Comprehensive Survey on
  Embodied AI
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Y. Liu
Weixing Chen
Yongjie Bai
Xiaodan Liang
Guanbin Li
Wen Gao
Liang Lin
LM&Ro
SyDa
AI4CE
37
27
0
09 Jul 2024
Investigating the Role of Instruction Variety and Task Difficulty in
  Robotic Manipulation Tasks
Investigating the Role of Instruction Variety and Task Difficulty in Robotic Manipulation Tasks
Amit Parekh
Nikolas Vitsakis
Alessandro Suglia
Ioannis Konstas
AAML
23
4
0
04 Jul 2024
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with
  Modality Collaboration
mPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration
Qinghao Ye
Haiyang Xu
Jiabo Ye
Mingshi Yan
Anwen Hu
Haowei Liu
Qi Qian
Ji Zhang
Fei Huang
Jingren Zhou
MLLM
VLM
114
367
0
07 Nov 2023
Q-Transformer: Scalable Offline Reinforcement Learning via
  Autoregressive Q-Functions
Q-Transformer: Scalable Offline Reinforcement Learning via Autoregressive Q-Functions
Yevgen Chebotar
Q. Vuong
A. Irpan
Karol Hausman
F. Xia
...
Brianna Zitkovich
Tomas Jackson
Kanishka Rao
Chelsea Finn
Sergey Levine
OffRL
104
51
0
18 Sep 2023
mPLUG-Owl: Modularization Empowers Large Language Models with
  Multimodality
mPLUG-Owl: Modularization Empowers Large Language Models with Multimodality
Qinghao Ye
Haiyang Xu
Guohai Xu
Jiabo Ye
Ming Yan
...
Junfeng Tian
Qiang Qi
Ji Zhang
Feiyan Huang
Jingren Zhou
VLM
MLLM
198
883
0
27 Apr 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
136
97
0
02 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image
  Encoders and Large Language Models
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Real-World Robot Learning with Masked Visual Pre-training
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
135
181
0
06 Oct 2022
ReAct: Synergizing Reasoning and Acting in Language Models
ReAct: Synergizing Reasoning and Acting in Language Models
Shunyu Yao
Jeffrey Zhao
Dian Yu
Nan Du
Izhak Shafran
Karthik Narasimhan
Yuan Cao
LLMAG
ReLM
LRM
208
2,413
0
06 Oct 2022
Grounding Language with Visual Affordances over Unstructured Data
Grounding Language with Visual Affordances over Unstructured Data
Oier Mees
Jessica Borja-Diaz
Wolfram Burgard
LM&Ro
111
106
0
04 Oct 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language
  Models
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
D. Fox
Jesse Thomason
Animesh Garg
LM&Ro
LLMAG
104
616
0
22 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
D. Fox
LM&Ro
138
449
0
12 Sep 2022
Instruction-driven history-aware policies for robotic manipulations
Instruction-driven history-aware policies for robotic manipulations
Pierre-Louis Guhur
Shizhe Chen
Ricardo Garcia Pinel
Makarand Tapaswi
Ivan Laptev
Cordelia Schmid
LM&Ro
89
101
0
11 Sep 2022
Masked World Models for Visual Control
Masked World Models for Visual Control
Younggyo Seo
Danijar Hafner
Hao Liu
Fangchen Liu
Stephen James
Kimin Lee
Pieter Abbeel
OffRL
71
95
0
28 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
VLP: A Survey on Vision-Language Pre-training
VLP: A Survey on Vision-Language Pre-training
Feilong Chen
Duzhen Zhang
Minglun Han
Xiuyi Chen
Jing Shi
Shuang Xu
Bo Xu
VLM
71
208
0
18 Feb 2022
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
255
5,353
0
11 Nov 2021
Skill Induction and Planning with Latent Language
Skill Induction and Planning with Latent Language
Pratyusha Sharma
Antonio Torralba
Jacob Andreas
LM&Ro
178
108
0
04 Oct 2021
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain
  Datasets
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
F. Ebert
Yanlai Yang
Karl Schmeckpeper
Bernadette Bucher
G. Georgakis
Kostas Daniilidis
Chelsea Finn
Sergey Levine
147
212
0
27 Sep 2021
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual,
  Interactive, and Ecological Environments
BEHAVIOR: Benchmark for Everyday Household Activities in Virtual, Interactive, and Ecological Environments
S. Srivastava
Chengshu Li
Michael Lingelbach
Roberto Martín-Martín
Fei Xia
...
C. Karen Liu
Silvio Savarese
H. Gweon
Jiajun Wu
Li Fei-Fei
LM&Ro
124
117
0
06 Aug 2021
iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday
  Household Tasks
iGibson 2.0: Object-Centric Simulation for Robot Learning of Everyday Household Tasks
Chengshu Li
Fei Xia
Roberto Martín-Martín
Michael Lingelbach
S. Srivastava
...
Karen Liu
H. Gweon
Jiajun Wu
Li Fei-Fei
Silvio Savarese
LM&Ro
134
154
0
06 Aug 2021
A Persistent Spatial Semantic Representation for High-level Natural
  Language Instruction Execution
A Persistent Spatial Semantic Representation for High-level Natural Language Instruction Execution
Valts Blukis
Chris Paxton
D. Fox
Animesh Garg
Yoav Artzi
LM&Ro
201
114
0
12 Jul 2021
Open-vocabulary Object Detection via Vision and Language Knowledge
  Distillation
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
200
698
0
28 Apr 2021
12
Next