Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.00598
Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"
50 / 443 papers shown
Title
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering
David Romero
Thamar Solorio
109
4
0
16 Feb 2024
BBSEA: An Exploration of Brain-Body Synchronization for Embodied Agents
Sizhe Yang
Qian Luo
Anumpam Pani
Yanchao Yang
37
2
0
13 Feb 2024
PIVOT: Iterative Visual Prompting Elicits Actionable Knowledge for VLMs
Soroush Nasiriany
Fei Xia
Wenhao Yu
Ted Xiao
Jacky Liang
...
Karol Hausman
N. Heess
Chelsea Finn
Sergey Levine
Brian Ichter
LM&Ro
LRM
30
92
0
12 Feb 2024
An Empirical Study Into What Matters for Calibrating Vision-Language Models
Weijie Tu
Weijian Deng
Dylan Campbell
Stephen Gould
Tom Gedeon
VLM
35
7
0
12 Feb 2024
TIC: Translate-Infer-Compile for accurate "text to plan" using LLMs and Logical Representations
Sudhir Agarwal
A. Sreepathy
31
1
0
09 Feb 2024
Memory Consolidation Enables Long-Context Video Understanding
Ivana Balavzević
Yuge Shi
Pinelopi Papalampidi
Rahma Chaabouni
Skanda Koppula
Olivier J. Hénaff
105
22
0
08 Feb 2024
Real-World Robot Applications of Foundation Models: A Review
Kento Kawaharazuka
T. Matsushima
Andrew Gambardella
Jiaxian Guo
Chris Paxton
Andy Zeng
OffRL
VLM
LM&Ro
51
45
0
08 Feb 2024
S-Agents: Self-organizing Agents in Open-ended Environments
Jia-Qing Chen
Yu-Gang Jiang
Jiachen Lu
Li Zhang
AIFin
LLMAG
LM&Ro
60
15
0
07 Feb 2024
"Task Success" is not Enough: Investigating the Use of Video-Language Models as Behavior Critics for Catching Undesirable Agent Behaviors
L. Guan
Yifan Zhou
Denis Liu
Yantian Zha
H. B. Amor
Subbarao Kambhampati
LM&Ro
39
16
0
06 Feb 2024
Preference-Conditioned Language-Guided Abstraction
Andi Peng
Andreea Bobu
Belinda Z. Li
T. Sumers
Ilia Sucholutsky
Nishanth Kumar
Thomas L. Griffiths
Julie A. Shah
32
12
0
05 Feb 2024
LLM Agents in Interaction: Measuring Personality Consistency and Linguistic Alignment in Interacting Populations of Large Language Models
Ivar Frisch
Mario Giulianelli
35
9
0
05 Feb 2024
Weaver: Foundation Models for Creative Writing
Tiannan Wang
Jiamin Chen
Qingrui Jia
Shuai Wang
Ruoyu Fang
...
Xiaohua Xu
Ningyu Zhang
Huajun Chen
Yuchen Eleanor Jiang
Wangchunshu Zhou
33
19
0
30 Jan 2024
Image-Text Out-Of-Context Detection Using Synthetic Multimodal Misinformation
Fatma Shalabi
H. Nguyen
Hichem Felouat
Ching-Chun Chang
Isao Echizen
40
5
0
29 Jan 2024
True Knowledge Comes from Practice: Aligning LLMs with Embodied Environments via Reinforcement Learning
Weihao Tan
Wentao Zhang
Shanqi Liu
Longtao Zheng
Xinrun Wang
Bo An
OffRL
44
17
0
25 Jan 2024
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities
Boyuan Chen
Zhuo Xu
Sean Kirmani
Brian Ichter
Danny Driess
Pete Florence
Dorsa Sadigh
Leonidas J. Guibas
Fei Xia
LRM
ReLM
52
206
0
22 Jan 2024
SocraSynth: Multi-LLM Reasoning with Conditional Statistics
Edward Y. Chang
LLMAG
LRM
33
7
0
19 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
47
69
0
10 Jan 2024
Large Language Models for Robotics: Opportunities, Challenges, and Perspectives
Jiaqi Wang
Zihao Wu
Yiwei Li
Hanqi Jiang
Peng Shu
...
Lin Zhao
Bao Ge
Xiang Li
Tianming Liu
Shu Zhang
LM&Ro
40
61
0
09 Jan 2024
GPT-4V(ision) is a Human-Aligned Evaluator for Text-to-3D Generation
Tong Wu
Guandao Yang
Zhibing Li
Kai Zhang
Ziwei Liu
Leonidas J. Guibas
Dahua Lin
Gordon Wetzstein
EGVM
VGen
35
89
0
08 Jan 2024
A Philosophical Introduction to Language Models -- Part I: Continuity With Classic Debates
Raphael Milliere
Cameron Buckner
LRM
ELM
41
20
0
08 Jan 2024
LLM Augmented LLMs: Expanding Capabilities through Composition
Rachit Bansal
Bidisha Samanta
Siddharth Dalmia
Nitish Gupta
Shikhar Vashishth
Sriram Ganapathy
Abhishek Bapna
Prateek Jain
Partha P. Talukdar
CLL
21
34
0
04 Jan 2024
Mining Fine-Grained Image-Text Alignment for Zero-Shot Captioning via Text-Only Training
Longtian Qiu
Shan Ning
Xuming He
VLM
38
3
0
04 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRM
VLM
27
9
0
03 Jan 2024
Video Understanding with Large Language Models: A Survey
Yunlong Tang
Jing Bi
Siting Xu
Luchuan Song
Susan Liang
...
Feng Zheng
Jianguo Zhang
Ping Luo
Jiebo Luo
Chenliang Xu
VLM
54
84
0
29 Dec 2023
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
108
80
0
28 Dec 2023
InfoVisDial: An Informative Visual Dialogue Dataset by Bridging Large Multimodal and Language Models
Bingbing Wen
Zhengyuan Yang
Jianfeng Wang
Zhe Gan
Bill Howe
Lijuan Wang
MLLM
44
1
0
21 Dec 2023
Social Learning: Towards Collaborative Learning with Large Language Models
Amirkeivan Mohtashami
Florian Hartmann
Sian Gooding
Lukás Zilka
Matt Sharifi
Blaise Agüera y Arcas
8
10
0
18 Dec 2023
A Survey on Robotic Manipulation of Deformable Objects: Recent Advances, Open Challenges and New Frontiers
Feida Gu
Yanmin Zhou
Zhipeng Wang
Shuo Jiang
Bin He
AI4CE
16
8
0
16 Dec 2023
SMILE: Multimodal Dataset for Understanding Laughter in Video with Language Models
Lee Hyun
Kim Sung-Bin
Seungju Han
Youngjae Yu
Tae-Hyun Oh
39
13
0
15 Dec 2023
Foundation Models in Robotics: Applications, Challenges, and the Future
Roya Firoozi
Johnathan Tucker
Stephen Tian
Anirudha Majumdar
Jiankai Sun
...
Brian Ichter
Danny Driess
Jiajun Wu
Cewu Lu
Mac Schwager
LM&Ro
AI4CE
LRM
VLM
37
140
0
13 Dec 2023
From Text to Motion: Grounding GPT-4 in a Humanoid Robot "Alter3"
Takahide Yoshida
A. Masumori
Takashi Ikegami
24
18
0
11 Dec 2023
Localized Symbolic Knowledge Distillation for Visual Commonsense Models
J. Park
Jack Hessel
Khyathi Raghavi Chandu
Paul Pu Liang
Ximing Lu
...
Youngjae Yu
Qiuyuan Huang
Jianfeng Gao
Ali Farhadi
Yejin Choi
VLM
29
11
0
08 Dec 2023
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
43
15
0
07 Dec 2023
Chain of Code: Reasoning with a Language Model-Augmented Code Emulator
Chengshu Li
Jacky Liang
Andy Zeng
Xinyun Chen
Karol Hausman
Dorsa Sadigh
Sergey Levine
Fei-Fei Li
Fei Xia
Brian Ichter
LLMAG
LRM
36
71
0
07 Dec 2023
LaMPilot: An Open Benchmark Dataset for Autonomous Driving with Language Model Programs
Yunsheng Ma
Can Cui
Xu Cao
Wenqian Ye
Peiran Liu
...
Rohit Gupta
Kyungtae Han
Aniket Bera
James M. Rehg
Ziran Wang
32
42
0
07 Dec 2023
FoMo Rewards: Can we cast foundation models as reward functions?
Ekdeep Singh Lubana
Johann Brehmer
P. D. Haan
Taco S. Cohen
OffRL
LRM
48
2
0
06 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLM
LRM
MLLM
47
29
0
05 Dec 2023
Let's Think Outside the Box: Exploring Leap-of-Thought in Large Language Models with Creative Humor Generation
Shan Zhong
Zhongzhan Huang
Shanghua Gao
Wushao Wen
Liang Lin
Marinka Zitnik
Pan Zhou
LLMAG
LRM
19
35
0
05 Dec 2023
SARA-RT: Scaling up Robotics Transformers with Self-Adaptive Robust Attention
Isabel Leal
Krzysztof Choromanski
Deepali Jain
Kumar Avinava Dubey
Jake Varley
...
Q. Vuong
Tamás Sarlós
Kenneth Oslund
Karol Hausman
Kanishka Rao
44
8
0
04 Dec 2023
LVDiffusor: Distilling Functional Rearrangement Priors from Large Models into Diffusor
Yiming Zeng
Mingdong Wu
Long Yang
Jiyao Zhang
Hao Ding
Hui Cheng
Hao Dong
DiffM
21
8
0
03 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kris M. Kitani
László A. Jeni
21
21
0
01 Dec 2023
Look Before You Leap: Unveiling the Power of GPT-4V in Robotic Vision-Language Planning
Yingdong Hu
Fanqi Lin
Tong Zhang
Li Yi
Yang Gao
LM&Ro
91
101
0
29 Nov 2023
PALM: Predicting Actions through Language Models
Sanghwan Kim
Daoji Huang
Yongqin Xian
Otmar Hilliges
Luc Van Gool
Xi Wang
VLM
22
10
0
29 Nov 2023
ROSO: Improving Robotic Policy Inference via Synthetic Observations
Yusuke Miyashita
Dimitris Gahtidis
Colin La
Jeremy Rabinowicz
Juxi Leitner
37
1
0
28 Nov 2023
RoboGPT: an intelligent agent of making embodied long-term decisions for daily instruction tasks
Yaran Chen
Wenbo Cui
Yuanwen Chen
Mining Tan
Xinyao Zhang
Dong Zhao
He Wang
LM&Ro
LLMAG
36
0
0
27 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
27
19
0
22 Nov 2023
GAIA: a benchmark for General AI Assistants
Grégoire Mialon
Clémentine Fourrier
Craig Swift
Thomas Wolf
Yann LeCun
Thomas Scialom
AI4MH
ALM
ELM
RALM
17
141
0
21 Nov 2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
Minghe Gao
Juncheng Li
Hao Fei
Liang Pang
Wei Ji
Guoming Wang
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
34
8
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
41
251
0
21 Nov 2023
GPT-4V(ision) for Robotics: Multimodal Task Planning from Human Demonstration
Naoki Wake
Atsushi Kanehira
Kazuhiro Sasabuchi
Jun Takamatsu
Katsushi Ikeuchi
LM&Ro
21
61
0
20 Nov 2023
Previous
1
2
3
4
5
6
7
8
9
Next