Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.05741
Cited By
Real-World Robot Applications of Foundation Models: A Review
8 February 2024
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Real-World Robot Applications of Foundation Models: A Review"
34 / 34 papers shown
Title
System 0/1/2/3: Quad-process theory for multi-timescale embodied collective cognitive systems
Tadahiro Taniguchi
Yasushi Hirai
Masahiro Suzuki
Shingo Murata
Takato Horii
Kazutoshi Tanaka
AI4CE
49
0
0
08 Mar 2025
Large Language Models for Multi-Robot Systems: A Survey
Peihan Li
Zijian An
Shams Abrar
Lifeng Zhou
LM&Ro
LRM
44
4
0
06 Feb 2025
Mobile Manipulation Instruction Generation from Multiple Images with Automatic Metric Enhancement
Kei Katsumata
Motonari Kambara
Daichi Yashima
Ryosuke Korekata
Komei Sugiura
56
0
0
28 Jan 2025
Do large language vision models understand 3D shapes?
Sagi Eppel
3DV
81
1
0
14 Dec 2024
Object Segmentation from Open-Vocabulary Manipulation Instructions Based on Optimal Transport Polygon Matching with Multimodal Foundation Models
Takayuki Nishimura
Katsuyuki Kuyo
Motonari Kambara
Komei Sugiura
DiffM
19
0
0
01 Jul 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
58
38
0
23 May 2024
OK-Robot: What Really Matters in Integrating Open-Knowledge Models for Robotics
Peiqi Liu
Yaswanth Orru
Jay Vakil
Chris Paxton
Nur Muhammad (Mahi) Shafiullah
Lerrel Pinto
LM&Ro
VLM
86
27
0
22 Jan 2024
Mobile ALOHA: Learning Bimanual Mobile Manipulation with Low-Cost Whole-Body Teleoperation
Zipeng Fu
Tony Zhao
Chelsea Finn
100
281
0
04 Jan 2024
Large Language Models for Robotics: A Survey
Fanlong Zeng
Wensheng Gan
Yongheng Wang
Ning Liu
Philip S. Yu
LM&Ro
109
121
0
13 Nov 2023
Plan, Eliminate, and Track -- Language Models are Good Teachers for Embodied Agents
Yu-Chih Chen
So Yeon Min
Chase Davis
Ruslan Salakhutdinov
A. Azaria
Yuan-Fang Li
Tom Michael Mitchell
A. Bovik
LM&Ro
LLMAG
70
26
0
03 May 2023
GestureDiffuCLIP: Gesture Diffusion Model with CLIP Latents
Tenglong Ao
Zeyi Zhang
Libin Liu
DiffM
VGen
65
88
0
26 Mar 2023
Open-World Object Manipulation using Pre-trained Vision-Language Models
Austin Stone
Ted Xiao
Yao Lu
K. Gopalakrishnan
Kuang-Huei Lee
...
Sean Kirmani
Brianna Zitkovich
F. Xia
Chelsea Finn
Karol Hausman
LM&Ro
139
144
0
02 Mar 2023
BLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models
Junnan Li
Dongxu Li
Silvio Savarese
Steven C. H. Hoi
VLM
MLLM
244
4,186
0
30 Jan 2023
Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
140
337
0
11 Oct 2022
CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory
Nur Muhammad (Mahi) Shafiullah
Chris Paxton
Lerrel Pinto
Soumith Chintala
Arthur Szlam
VLM
LM&Ro
CLIP
87
155
0
11 Oct 2022
Real-World Robot Learning with Masked Visual Pre-training
Ilija Radosavovic
Tete Xiao
Stephen James
Pieter Abbeel
Jitendra Malik
Trevor Darrell
SSL
144
238
0
06 Oct 2022
DALL-E-Bot: Introducing Web-Scale Diffusion Models to Robotics
Ivan Kapelyukh
Vitalis Vosylius
Edward Johns
LM&Ro
DiffM
91
143
0
05 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
240
1,070
0
05 Oct 2022
Grounding Language with Visual Affordances over Unstructured Data
Oier Mees
Jessica Borja-Diaz
Wolfram Burgard
LM&Ro
121
106
0
04 Oct 2022
Human Motion Diffusion Model
Guy Tevet
Sigal Raab
Brian Gordon
Yonatan Shafir
Daniel Cohen-Or
Amit H. Bermano
DiffM
VGen
177
713
0
29 Sep 2022
ProgPrompt: Generating Situated Robot Task Plans using Large Language Models
Ishika Singh
Valts Blukis
Arsalan Mousavian
Ankit Goyal
Danfei Xu
Jonathan Tremblay
D. Fox
Jesse Thomason
Animesh Garg
LM&Ro
LLMAG
109
616
0
22 Sep 2022
Open-vocabulary Queryable Scene Representations for Real World Planning
Boyuan Chen
F. Xia
Brian Ichter
Kanishka Rao
K. Gopalakrishnan
Michael S. Ryoo
Austin Stone
Daniel Kappler
LM&Ro
138
179
0
20 Sep 2022
Perceiver-Actor: A Multi-Task Transformer for Robotic Manipulation
Mohit Shridhar
Lucas Manuelli
D. Fox
LM&Ro
141
449
0
12 Sep 2022
Diffusion Models: A Comprehensive Survey of Methods and Applications
Ling Yang
Zhilong Zhang
Yingxia Shao
Shenda Hong
Runsheng Xu
Yue Zhao
Wentao Zhang
Bin Cui
Ming-Hsuan Yang
DiffM
MedIm
211
1,277
0
02 Sep 2022
LM-Nav: Robotic Navigation with Large Pre-Trained Models of Language, Vision, and Action
Dhruv Shah
B. Osinski
Brian Ichter
Sergey Levine
LM&Ro
136
430
0
10 Jul 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
315
8,261
0
28 Jan 2022
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
258
7,337
0
11 Nov 2021
Guiding Visual Question Generation
Nihir Vedd
Zixu Wang
Marek Rei
Yishu Miao
Lucia Specia
66
23
0
15 Oct 2021
Ego4D: Around the World in 3,000 Hours of Egocentric Video
Kristen Grauman
Andrew Westbury
Eugene Byrne
Zachary Chavis
Antonino Furnari
...
Mike Zheng Shou
Antonio Torralba
Lorenzo Torresani
Mingfei Yan
Jitendra Malik
EgoV
212
682
0
13 Oct 2021
Bridge Data: Boosting Generalization of Robotic Skills with Cross-Domain Datasets
F. Ebert
Yanlai Yang
Karl Schmeckpeper
Bernadette Bucher
G. Georgakis
Kostas Daniilidis
Chelsea Finn
Sergey Levine
147
212
0
27 Sep 2021
Open-vocabulary Object Detection via Vision and Language Knowledge Distillation
Xiuye Gu
Tsung-Yi Lin
Weicheng Kuo
Yin Cui
VLM
ObjD
218
698
0
28 Apr 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
You Only Look Once: Unified, Real-Time Object Detection
Joseph Redmon
S. Divvala
Ross B. Girshick
Ali Farhadi
ObjD
269
35,677
0
08 Jun 2015
1