ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2311.12015
  4. Cited By
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human
  Demonstration
v1v2 (latest)

GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration

IEEE Robotics and Automation Letters (RA-L), 2023
20 November 2023
Naoki Wake
Atsushi Kanehira
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
    LM&Ro
ArXiv (abs)PDFHTMLHuggingFace (6 upvotes)

Papers citing "GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration"

36 / 36 papers shown
ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models
ResponsibleRobotBench: Benchmarking Responsible Robot Manipulation using Multi-modal Large Language Models
Lei Zhang
Ju Dong
Kaixin Bai
Minheng Ni
Zoltán-Csaba Márton
Zhaopeng Chen
Jianwei Zhang
LM&Ro
233
0
0
03 Dec 2025
BOP-ASK: Object-Interaction Reasoning for Vision-Language Models
BOP-ASK: Object-Interaction Reasoning for Vision-Language Models
V. Bhat
Sungsu Kim
Valts Blukis
Greg Heinrich
Prashanth Krishnamurthy
Ramesh Karri
Stan Birchfield
Farshad Khorrami
Jonathan Tremblay
VLM
239
1
0
20 Nov 2025
Intuitive Programming, Adaptive Task Planning, and Dynamic Role Allocation in Human-Robot Collaboration
Intuitive Programming, Adaptive Task Planning, and Dynamic Role Allocation in Human-Robot CollaborationAnnual Review of Control Robotics and Autonomous Systems (RCRAS), 2025
Marta Lagomarsino
Elena Merlo
Andrea Pupa
Timo Birr
F. Krebs
Cristian Secchi
Tamim Asfour
Arash Ajoudani
124
1
0
11 Nov 2025
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
EfficientNav: Towards On-Device Object-Goal Navigation with Navigation Map Caching and Retrieval
Zebin Yang
Sunjian Zheng
Tong Xie
Tianshi Xu
Bo Yu
Fan Wang
Jie Tang
Shaoshan Liu
Meng Li
118
0
0
21 Oct 2025
Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation
Score the Steps, Not Just the Goal: VLM-Based Subgoal Evaluation for Robotic Manipulation
Ramy ElMallah
Krish Chhajer
Chi-Guhn Lee
171
1
0
23 Sep 2025
IDfRA: Self-Verification for Iterative Design in Robotic Assembly
IDfRA: Self-Verification for Iterative Design in Robotic Assembly
Nishka Khendry
Christos Margadji
Sebastian W. Pattinson
163
0
0
21 Sep 2025
DepthVision: Enabling Robust Vision-Language Models with GAN-Based LiDAR-to-RGB Synthesis for Autonomous Driving
DepthVision: Enabling Robust Vision-Language Models with GAN-Based LiDAR-to-RGB Synthesis for Autonomous Driving
Sven Kirchner
Nils Purschke
Ross Greer
Alois C. Knoll
3DVVLM
176
0
0
09 Sep 2025
Long-Horizon Visual Imitation Learning via Plan and Code Reflection
Long-Horizon Visual Imitation Learning via Plan and Code Reflection
Quan Chen
Chenrui Shi
Qi Chen
Yuwei Wu
Zhi Gao
Xintong Zhang
Rui Gao
Kun Wu
Yunde Jia
161
1
0
04 Sep 2025
FMimic: Foundation Models are Fine-grained Action Learners from Human Videos
FMimic: Foundation Models are Fine-grained Action Learners from Human VideosThe international journal of robotics research (IJRR), 2025
Guangyan Chen
Meiling Wang
Te Cui
Yao Mu
Haoyang Lu
...
Mengxiao Hu
Tianxing Zhou
M. Fu
Yi Yang
Yufeng Yue
LM&RoVLM
158
5
0
28 Jul 2025
A Human-in-the-loop Approach to Robot Action Replanning through LLM Common-Sense Reasoning
A Human-in-the-loop Approach to Robot Action Replanning through LLM Common-Sense ReasoningIEEE Robotics and Automation Letters (IEEE RA-L), 2025
Elena Merlo
Marta Lagomarsino
Arash Ajoudani
LRM
209
1
0
28 Jul 2025
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
MapleGrasp: Mask-guided Feature Pooling for Language-driven Efficient Robotic Grasping
V. Bhat
Naman Patel
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
286
0
0
06 Jun 2025
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
OmniSpatial: Towards Comprehensive Spatial Reasoning Benchmark for Vision Language Models
Mengdi Jia
Zekun Qi
Shaochen Zhang
Wenyao Zhang
Xinqiang Yu
Jiawei He
He Wang
L. Yi
LRMVLM
331
28
0
03 Jun 2025
LA-RCS: LLM-Agent-Based Robot Control System
LA-RCS: LLM-Agent-Based Robot Control System
TaekHyun Park
YoungJun Choi
SeungHoon Shin
Kwangil Lee
232
2
0
23 May 2025
Multi-Agent Systems for Robotic Autonomy with LLMs
Multi-Agent Systems for Robotic Autonomy with LLMs
Junhong Chen
Ziqi Yang
Haoyuan G Xu
Dandan Zhang
George Mylonas
LLMAG
209
6
0
09 May 2025
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks
Weinan Zhang
Mengna Wang
Gangao Liu
Xu Huixin
Yiwei Jiang
...
Hang Zhang
Xin Li
Weiming Lu
Peng Li
Yueting Zhuang
LM&RoLRM
530
40
0
27 Mar 2025
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in ClutterIEEE Transactions on Automation Science and Engineering (T-ASE), 2025
Kechun Xu
Xunlong Xia
Kaixuan Wang
Yifei Yang
Yunxuan Mao
Bing Deng
R. Xiong
Longji Xu
Yue Wang
OffRL
494
2
0
12 Mar 2025
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Reflective Planning: Vision-Language Models for Multi-Stage Long-Horizon Robotic Manipulation
Yunhai Feng
Jiaming Han
Zhiyong Yang
Xiangyu Yue
Sergey Levine
Jianlan Luo
LM&Ro
371
25
0
23 Feb 2025
Don't Let Your Robot be Harmful: Responsible Robotic Manipulation via Safety-as-Policy
Don't Let Your Robot be Harmful: Responsible Robotic Manipulation via Safety-as-PolicyIEEE Robotics and Automation Letters (RA-L), 2024
Minheng Ni
Lei Zhang
Zhaoyu Chen
Guang Dai
Wangmeng Zuo
Jianwei Zhang
Lei Zhang
W. Zuo
423
1
0
27 Nov 2024
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
CityWalker: Learning Embodied Urban Navigation from Web-Scale VideosComputer Vision and Pattern Recognition (CVPR), 2024
Xinhao Liu
Jiajian Li
Yichen Jiang
Niranjan Sujay
Zhiyong Yang
Juexiao Zhang
John Abanes
Jing Zhang
Chen Feng
537
25
0
26 Nov 2024
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for RoboticsComputer Vision and Pattern Recognition (CVPR), 2024
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
846
84
0
25 Nov 2024
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Reflexive Guidance: Improving OoDD in Vision-Language Models via Self-Guided Image-Adaptive Concept Generation
Seulbi Lee
J. Kim
Sangheum Hwang
LRM
1.0K
3
0
19 Oct 2024
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
HiFi-CS: Towards Open Vocabulary Visual Grounding For Robotic Grasping Using Vision-Language Models
V. Bhat
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
485
10
0
16 Sep 2024
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language
  Models
REVISION: Rendering Tools Enable Spatial Fidelity in Vision-Language ModelsEuropean Conference on Computer Vision (ECCV), 2024
Agneet Chatterjee
Yiran Luo
Tejas Gokhale
Yezhou Yang
Chitta Baral
LRM
328
10
0
05 Aug 2024
ReplanVLM: Replanning Robotic Tasks with Visual Language Models
ReplanVLM: Replanning Robotic Tasks with Visual Language Models
Aoran Mei
Guo-Niu Zhu
Huaxiang Zhang
Zhongxue Gan
187
39
0
31 Jul 2024
Towards Open-World Grasping with Large Vision-Language Models
Towards Open-World Grasping with Large Vision-Language Models
Georgios Tziafas
Hamidreza Kasaei
LM&RoLRM
339
22
0
26 Jun 2024
Human-Object Interaction from Human-Level Instructions
Human-Object Interaction from Human-Level Instructions
Zhen Wu
Jiaman Li
Chenxi Liu
Chao Liu
LM&Ro
420
35
0
25 Jun 2024
Details Make a Difference: Object State-Sensitive Neurorobotic Task
  Planning
Details Make a Difference: Object State-Sensitive Neurorobotic Task PlanningInternational Conference on Artificial Neural Networks (ICANN), 2024
Xiaowen Sun
Xufeng Zhao
Jae Hee Lee
Wenhao Lu
Matthias Kerzel
Stefan Wermter
LM&Ro
221
4
0
14 Jun 2024
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Closed-Loop Open-Vocabulary Mobile Manipulation with GPT-4V
Peiyuan Zhi
Zhiyuan Zhang
Muzhi Han
Zeyu Zhang
Zhitian Li
Ziyuan Jiao
Ziyuan Jiao
Siyuan Huang
Siyuan Huang
LRMLM&Ro
322
51
0
16 Apr 2024
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Zeren Chen
Zhelun Shi
Xiaoya Lu
Lehan He
Sucheng Qian
...
Zhen-fei Yin
Jing Shao
Jing Shao
Cewu Lu
Cewu Lu
285
13
0
28 Mar 2024
Never-Ending Behavior-Cloning Agent for Robotic Manipulation
Never-Ending Behavior-Cloning Agent for Robotic Manipulation
Wenqi Liang
Gan Sun
Qian He
Yu Ren
Jiahua Dong
Yang Cong
LM&Ro
278
5
0
01 Mar 2024
Scaffolding Coordinates to Promote Vision-Language Coordination in Large
  Multi-Modal Models
Scaffolding Coordinates to Promote Vision-Language Coordination in Large Multi-Modal Models
Xuanyu Lei
Zonghan Yang
Xinrui Chen
Peng Li
Yang Liu
MLLMLRM
299
54
0
19 Feb 2024
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
Grounding LLMs For Robot Task Planning Using Closed-loop State Feedback
V. Bhat
Ali Umut Kaypak
Prashanth Krishnamurthy
Ramesh Karri
Farshad Khorrami
LM&Ro
386
32
0
13 Feb 2024
UFO: A UI-Focused Agent for Windows OS Interaction
UFO: A UI-Focused Agent for Windows OS Interaction
Chaoyun Zhang
Liqun Li
Shilin He
Xu Zhang
Bo Qiao
...
Yu Kang
Qingwei Lin
Saravan Rajmohan
Dongmei Zhang
Qi Zhang
LLMAG
543
127
0
08 Feb 2024
Human Demonstrations are Generalizable Knowledge for Robots
Human Demonstrations are Generalizable Knowledge for Robots
Te Cui
Tianxing Zhou
Zicai Peng
Mengxiao Hu
Haoyang Lu
Haizhou Li
Guangyan Chen
Meiling Wang
Yi Yang
LM&Ro
406
10
0
05 Dec 2023
Interactive Task Planning with Language Models
Interactive Task Planning with Language Models
Boyi Li
Philipp Wu
Pieter Abbeel
Jitendra Malik
LM&Ro
363
52
0
16 Oct 2023
Transferring Foundation Models for Generalizable Robotic Manipulation
Transferring Foundation Models for Generalizable Robotic ManipulationIEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2023
Jiange Yang
Wenhui Tan
Chuhao Jin
Keling Yao
Bei Liu
Jianlong Fu
Ruihua Song
Gangshan Wu
Limin Wang
LM&Ro
397
20
0
09 Jun 2023
1