Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.07872
Cited By
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
12 February 2024
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
Ishita Dasgupta
Annie Xie
Danny Driess
Ayzaan Wahid
Zhuo Xu
Q. Vuong
Tingnan Zhang
T. Lee
Kuang-Huei Lee
Peng-Tao Xu
Sean Kirmani
Yuke Zhu
Andy Zeng
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&Ro
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs"
32 / 32 papers shown
Title
Mapping User Trust in Vision Language Models: Research Landscape, Challenges, and Prospects
Agnese Chiatti
Sara Bernardini
Lara Shibelski Godoy Piccolo
Viola Schiaffonati
Matteo Matteucci
52
0
0
08 May 2025
G-FOCUS: Towards a Robust Method for Assessing UI Design Persuasiveness
Jaehyun Jeon
Janghan Yoon
Minsoo Kim
Sumin Shim
Yejin Choi
Hanbin Kim
Youngjae Yu
AAML
33
0
0
08 May 2025
CrayonRobo: Object-Centric Prompt-Driven Vision-Language-Action Model for Robotic Manipulation
Xiaoqi Li
Lingyun Xu
M. Zhang
Jiaming Liu
Yan Shen
...
Jiahui Xu
Liang Heng
Siyuan Huang
S. Zhang
Hao Dong
LM&Ro
36
0
0
04 May 2025
Visual Test-time Scaling for GUI Agent Grounding
Tiange Luo
Lajanugen Logeswaran
Justin Johnson
Honglak Lee
48
0
0
01 May 2025
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
31
0
0
17 Apr 2025
GraspCorrect: Robotic Grasp Correction via Vision-Language Model-Guided Feedback
Sungjae Lee
Yeonjoo Hong
Kwang In KIm
44
0
0
19 Mar 2025
AutoSpatial: Visual-Language Reasoning for Social Robot Navigation through Efficient Spatial Reasoning Learning
Yangzhe Kong
Daeun Song
Jing Liang
Dinesh Manocha
Ziyu Yao
Xuesu Xiao
LRM
56
1
0
10 Mar 2025
FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
Chao Tang
Anxing Xiao
Yuhong Deng
Tianrun Hu
Wenlong Dong
Hanbo Zhang
David Hsu
Hong Zhang
71
2
0
24 Feb 2025
Magma: A Foundation Model for Multimodal AI Agents
Jianwei Yang
Reuben Tan
Qianhui Wu
Ruijie Zheng
Baolin Peng
...
Seonghyeon Ye
Joel Jang
Yuquan Deng
Lars Liden
Jianfeng Gao
VLM
AI4TS
98
8
0
18 Feb 2025
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel
Xinchen Yin
Wenlong Huang
Shubham Garg
H. Nayyeri
Li Fei-Fei
Svetlana Lazebnik
Y. Li
85
0
0
12 Feb 2025
HAMSTER: Hierarchical Action Models For Open-World Robot Manipulation
Yi Li
Yuquan Deng
J. Zhang
Joel Jang
Marius Memme
...
Fabio Ramos
Dieter Fox
Anqi Li
Abhishek Gupta
Ankit Goyal
LM&Ro
80
5
0
08 Feb 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
88
45
0
03 Jan 2025
RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics
Chan Hee Song
Valts Blukis
Jonathan Tremblay
Stephen Tyree
Yu-Chuan Su
Stan Birchfield
83
4
0
25 Nov 2024
TopV-Nav: Unlocking the Top-View Spatial Reasoning Potential of MLLM for Zero-shot Object Navigation
Linqing Zhong
Chen Gao
Zihan Ding
Yue Liao
Si Liu
Shifeng Zhang
Xu Zhou
Si Liu
LRM
82
3
0
25 Nov 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
32
2
0
13 Oct 2024
Unpacking Failure Modes of Generative Policies: Runtime Monitoring of Consistency and Progress
Christopher Agia
Rohan Sinha
Jingyun Yang
Zi-ang Cao
Rika Antonova
Marco Pavone
Jeannette Bohg
26
6
0
06 Oct 2024
Embodied-RAG: General Non-parametric Embodied Memory for Retrieval and Generation
Quanting Xie
So Yeon Min
Tianyi Zhang
Kedi Xu
Aarav Bajaj
Ruslan Salakhutdinov
Matthew Johnson-Roberson
Yonatan Bisk
Matthew Johnson-Roberson
Yonatan Bisk
LM&Ro
50
6
0
26 Sep 2024
MotIF: Motion Instruction Fine-tuning
Minyoung Hwang
Joey Hejna
Dorsa Sadigh
Yonatan Bisk
42
1
0
16 Sep 2024
Lexicon3D: Probing Visual Foundation Models for Complex 3D Scene Understanding
Yunze Man
Shuhong Zheng
Zhipeng Bao
M. Hebert
Liang-Yan Gui
Yu-xiong Wang
70
15
0
05 Sep 2024
VL-TGS: Trajectory Generation and Selection using Vision Language Models in Mapless Outdoor Environments
Daeun Song
Jing Liang
Xuesu Xiao
Dinesh Manocha
44
4
0
05 Aug 2024
Affordance-Guided Reinforcement Learning via Visual Prompting
Olivia Y. Lee
Annie Xie
Kuan Fang
Karl Pertsch
Chelsea Finn
OffRL
LM&Ro
62
7
0
14 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
55
23
0
28 Jun 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
62
38
0
23 May 2024
RH20T-P: A Primitive-Level Robotic Dataset Towards Composable Generalization Agents
Zeren Chen
Zhelun Shi
Xiaoya Lu
Lehan He
Sucheng Qian
...
Zhen-fei Yin
Jing Shao
Jing Shao
Cewu Lu
Cewu Lu
31
5
0
28 Mar 2024
Pixel Aligned Language Models
Jiarui Xu
Xingyi Zhou
Shen Yan
Xiuye Gu
Anurag Arnab
Chen Sun
Xiaolong Wang
Cordelia Schmid
MLLM
VLM
43
14
0
14 Dec 2023
Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
145
337
0
11 Oct 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
D. Fox
Jesse Thomason
Animesh Garg
LM&Ro
LLMAG
112
616
0
22 Sep 2022
Open-vocabulary Queryable Scene Representations for Real World Planning
Boyuan Chen
F. Xia
Brian Ichter
Kanishka Rao
K. Gopalakrishnan
Michael S. Ryoo
Austin Stone
Daniel Kappler
LM&Ro
144
179
0
20 Sep 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
136
430
0
10 Jul 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
278
3,784
0
18 Apr 2021
1