ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

50 / 443 papers shown
Title
SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on
  Scene Graphs
SG-Bot: Object Rearrangement via Coarse-to-Fine Robotic Imagination on Scene Graphs
Guangyao Zhai
Xiaoni Cai
Dianye Huang
Yan Di
Fabian Manhardt
Federico Tombari
Nassir Navab
Benjamin Busam
LM&Ro
24
27
0
21 Sep 2023
BELT:Bootstrapping Electroencephalography-to-Language Decoding and Zero-Shot Sentiment Classification by Natural Language Supervision
Jinzhao Zhou
Yiqun Duan
Yu-Cheng Chang
Yu-Kai Wang
Chin-Teng Lin
39
6
0
21 Sep 2023
Text2Reward: Reward Shaping with Language Models for Reinforcement
  Learning
Text2Reward: Reward Shaping with Language Models for Reinforcement Learning
Tianbao Xie
Siheng Zhao
Chen Henry Wu
Yitao Liu
Qian Luo
Victor Zhong
Yanchao Yang
Tao Yu
LM&Ro
47
48
0
20 Sep 2023
Guide Your Agent with Adaptive Multimodal Rewards
Guide Your Agent with Adaptive Multimodal Rewards
Changyeon Kim
Younggyo Seo
Hao Liu
Lisa Lee
Jinwoo Shin
Honglak Lee
Kimin Lee
23
9
0
19 Sep 2023
Conformal Temporal Logic Planning using Large Language Models
Conformal Temporal Logic Planning using Large Language Models
Jun Wang
J. Tong
Kai Liang Tan
Yevgeniy Vorobeychik
Y. Kantaros
LM&Ro
47
20
0
18 Sep 2023
From Cooking Recipes to Robot Task Trees -- Improving Planning
  Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network
From Cooking Recipes to Robot Task Trees -- Improving Planning Correctness and Task Efficiency by Leveraging LLMs with a Knowledge Network
Md. Sadman Sakib
Yu Sun
30
10
0
17 Sep 2023
Language Models as Black-Box Optimizers for Vision-Language Models
Language Models as Black-Box Optimizers for Vision-Language Models
Shihong Liu
Zhiqiu Lin
Samuel Yu
Ryan Lee
Tiffany Ling
Deepak Pathak
Deva Ramanan
VLM
32
28
0
12 Sep 2023
Incremental Learning of Humanoid Robot Behavior from Natural Interaction
  and Large Language Models
Incremental Learning of Humanoid Robot Behavior from Natural Interaction and Large Language Models
Leonard Barmann
Rainer Kartmann
Fabian Peller-Konrad
Jan Niehues
Alexander H. Waibel
Tamim Asfour
LM&Ro
20
24
0
08 Sep 2023
Physically Grounded Vision-Language Models for Robotic Manipulation
Physically Grounded Vision-Language Models for Robotic Manipulation
Jensen Gao
Bidipta Sarkar
F. Xia
Ted Xiao
Jiajun Wu
Brian Ichter
Anirudha Majumdar
Dorsa Sadigh
LM&Ro
20
113
0
05 Sep 2023
Cognitive Architectures for Language Agents
Cognitive Architectures for Language Agents
T. Sumers
Shunyu Yao
Karthik Narasimhan
Thomas L. Griffiths
LLMAG
LM&Ro
54
153
0
05 Sep 2023
ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon
  Sequential Task Planning
ISR-LLM: Iterative Self-Refined Large Language Model for Long-Horizon Sequential Task Planning
Zhehua Zhou
Jiayang Song
Kunpeng Yao
Zhan Shu
Lei Ma
22
57
0
26 Aug 2023
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual
  Captioning
MultiCapCLIP: Auto-Encoding Prompts for Zero-Shot Multilingual Visual Captioning
Bang-ju Yang
Fenglin Liu
X. Wu
Yaowei Wang
Xu Sun
Yuexian Zou
VLM
CLIP
44
13
0
25 Aug 2023
VisIT-Bench: A Benchmark for Vision-Language Instruction Following
  Inspired by Real-World Use
VisIT-Bench: A Benchmark for Vision-Language Instruction Following Inspired by Real-World Use
Yonatan Bitton
Hritik Bansal
Jack Hessel
Rulin Shao
Wanrong Zhu
Anas Awadalla
Josh Gardner
Rohan Taori
L. Schimdt
VLM
31
77
0
12 Aug 2023
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
MM-Vet: Evaluating Large Multimodal Models for Integrated Capabilities
Weihao Yu
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
Zicheng Liu
Xinchao Wang
Lijuan Wang
MLLM
48
607
0
04 Aug 2023
LEMMA: Learning Language-Conditioned Multi-Robot Manipulation
LEMMA: Learning Language-Conditioned Multi-Robot Manipulation
Ran Gong
Xiaofeng Gao
Qiaozi Gao
Suhaila Shakiah
Govind Thattai
Gaurav Sukhatme
LM&Ro
21
8
0
02 Aug 2023
Transferable Decoding with Visual Entities for Zero-Shot Image
  Captioning
Transferable Decoding with Visual Entities for Zero-Shot Image Captioning
Junjie Fei
Teng Wang
Jinrui Zhang
Zhenyu He
Chengjie Wang
Feng Zheng
VLM
28
34
0
31 Jul 2023
AntGPT: Can Large Language Models Help Long-term Action Anticipation
  from Videos?
AntGPT: Can Large Language Models Help Long-term Action Anticipation from Videos?
Qi Zhao
Shijie Wang
Ce Zhang
Changcheng Fu
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
LM&Ro
51
49
0
31 Jul 2023
GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for
  Task-Oriented Grasping
GraspGPT: Leveraging Semantic Knowledge from a Large Language Model for Task-Oriented Grasping
Chao Tang
Dehao Huang
Wenqiang Ge
Weiyu Liu
Hong Zhang
24
67
0
25 Jul 2023
A Real-World WebAgent with Planning, Long Context Understanding, and
  Program Synthesis
A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis
Izzeddin Gur
Hiroki Furuta
Austin Huang
Mustafa Safdari
Yutaka Matsuo
Douglas Eck
Aleksandra Faust
LM&Ro
LLMAG
39
198
0
24 Jul 2023
OBJECT 3DIT: Language-guided 3D-aware Image Editing
OBJECT 3DIT: Language-guided 3D-aware Image Editing
Oscar Michel
Anand Bhattad
Eli VanderBilt
Ranjay Krishna
Aniruddha Kembhavi
Tanmay Gupta
DiffM
30
39
0
20 Jul 2023
Towards A Unified Agent with Foundation Models
Towards A Unified Agent with Foundation Models
Norman Di Palo
Arunkumar Byravan
Leonard Hasenclever
Markus Wulfmeier
N. Heess
Martin Riedmiller
LM&Ro
LLMAG
OffRL
35
58
0
18 Jul 2023
Coupling Large Language Models with Logic Programming for Robust and
  General Reasoning from Text
Coupling Large Language Models with Logic Programming for Robust and General Reasoning from Text
Zhun Yang
Adam Ishay
Joohyung Lee
LRM
ELM
33
51
0
15 Jul 2023
SayPlan: Grounding Large Language Models using 3D Scene Graphs for
  Scalable Robot Task Planning
SayPlan: Grounding Large Language Models using 3D Scene Graphs for Scalable Robot Task Planning
Krishan Rana
Jesse Haviland
Sourav Garg
Jad Abou-Chakra
Ian Reid
Niko Sünderhauf
LM&Ro
37
218
0
12 Jul 2023
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with
  Language Models
VoxPoser: Composable 3D Value Maps for Robotic Manipulation with Language Models
Wenlong Huang
Chen Wang
Ruohan Zhang
Yunzhu Li
Jiajun Wu
Li Fei-Fei
LM&Ro
33
480
0
12 Jul 2023
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
RoCo: Dialectic Multi-Robot Collaboration with Large Language Models
Zhao Mandi
Shreeya Jain
Shuran Song
LM&Ro
LLMAG
31
125
0
10 Jul 2023
Large Language Models as General Pattern Machines
Large Language Models as General Pattern Machines
Suvir Mirchandani
F. Xia
Peter R. Florence
Brian Ichter
Danny Driess
Montse Gonzalez Arenas
Kanishka Rao
Dorsa Sadigh
Andy Zeng
LLMAG
57
184
0
10 Jul 2023
Robots That Ask For Help: Uncertainty Alignment for Large Language Model
  Planners
Robots That Ask For Help: Uncertainty Alignment for Large Language Model Planners
Allen Z. Ren
Anushri Dixit
Alexandra Bodrova
Sumeet Singh
Stephen Tu
...
Jacob Varley
Zhenjia Xu
Dorsa Sadigh
Andy Zeng
Anirudha Majumdar
LM&Ro
64
219
0
04 Jul 2023
Visual Instruction Tuning with Polite Flamingo
Visual Instruction Tuning with Polite Flamingo
Delong Chen
Jianfeng Liu
Wenliang Dai
Baoyuan Wang
MLLM
34
42
0
03 Jul 2023
Conformer LLMs -- Convolution Augmented Large Language Models
Conformer LLMs -- Convolution Augmented Large Language Models
Prateek Verma
23
1
0
02 Jul 2023
DoReMi: Grounding Language Model by Detecting and Recovering from
  Plan-Execution Misalignment
DoReMi: Grounding Language Model by Detecting and Recovering from Plan-Execution Misalignment
Yanjiang Guo
Yen-Jen Wang
Lihan Zha
Zheyuan Jiang
Jianyu Chen
LM&Ro
24
39
0
01 Jul 2023
Statler: State-Maintaining Language Models for Embodied Reasoning
Statler: State-Maintaining Language Models for Embodied Reasoning
Takuma Yoneda
Jiading Fang
Peng Li
Huanyu Zhang
Tianchong Jiang
Shengjie Lin
Ben Picker
David Yunis
Hongyuan Mei
Matthew R. Walter
LM&Ro
26
32
0
30 Jun 2023
Towards Language Models That Can See: Computer Vision Through the LENS
  of Natural Language
Towards Language Models That Can See: Computer Vision Through the LENS of Natural Language
William Berrios
Gautam Mittal
Tristan Thrush
Douwe Kiela
Amanpreet Singh
MLLM
VLM
13
61
0
28 Jun 2023
Next Steps for Human-Centered Generative AI: A Technical Perspective
Next Steps for Human-Centered Generative AI: A Technical Perspective
Xiang Ánthony' Chen
Jeff Burke
Andrea Colaço
Matthew K. Hong
Jennifer Jacobs
...
Dingzeyu Li
Nanyun Peng
Karl D. D. Willis
Chien-Sheng Wu
Bolei Zhou
LLMAG
27
32
0
27 Jun 2023
REFLECT: Summarizing Robot Experiences for Failure Explanation and
  Correction
REFLECT: Summarizing Robot Experiences for Failure Explanation and Correction
Zeyi Liu
Arpit Bahety
Shuran Song
LRM
29
116
0
27 Jun 2023
A Survey on Multimodal Large Language Models
A Survey on Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Ke Li
Xing Sun
Tong Xu
Enhong Chen
MLLM
LRM
54
556
0
23 Jun 2023
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen
  Large Language Models
Retrieving-to-Answer: Zero-Shot Video Question Answering with Frozen Large Language Models
Junting Pan
Ziyi Lin
Yuying Ge
Xiatian Zhu
Renrui Zhang
Yi Wang
Yu Qiao
Hongsheng Li
MLLM
24
26
0
15 Jun 2023
Language-Guided Music Recommendation for Video via Prompt Analogies
Language-Guided Music Recommendation for Video via Prompt Analogies
Daniel McKee
Justin Salamon
Josef Sivic
Bryan C. Russell
VGen
30
26
0
15 Jun 2023
Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Semantic HELM: A Human-Readable Memory for Reinforcement Learning
Fabian Paischer
Thomas Adler
M. Hofmarcher
Sepp Hochreiter
23
9
0
15 Jun 2023
Language to Rewards for Robotic Skill Synthesis
Language to Rewards for Robotic Skill Synthesis
Wenhao Yu
Nimrod Gileadi
Chuyuan Fu
Sean Kirmani
Kuang-Huei Lee
...
N. Heess
Dorsa Sadigh
Jie Tan
Yuval Tassa
F. Xia
LM&Ro
39
269
0
14 Jun 2023
AssistGPT: A General Multi-modal Assistant that can Plan, Execute,
  Inspect, and Learn
AssistGPT: A General Multi-modal Assistant that can Plan, Execute, Inspect, and Learn
Difei Gao
Lei Ji
Luowei Zhou
Kevin Lin
Joya Chen
Zihan Fan
Mike Zheng Shou
MLLM
27
72
0
14 Jun 2023
LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically
  Constructed from Live Streaming
LiveChat: A Large-Scale Personalized Dialogue Dataset Automatically Constructed from Live Streaming
Jingsheng Gao
Yixin Lian
Ziyi Zhou
Yuzhuo Fu
Baoyuan Wang
26
17
0
14 Jun 2023
SayTap: Language to Quadrupedal Locomotion
SayTap: Language to Quadrupedal Locomotion
Yujin Tang
Wenhao Yu
Jie Tan
Heiga Zen
Aleksandra Faust
Tatsuya Harada
36
40
0
13 Jun 2023
Embodied Executable Policy Learning with Language-based Scene
  Summarization
Embodied Executable Policy Learning with Language-based Scene Summarization
Jielin Qiu
Mengdi Xu
William Jongwon Han
Seungwhan Moon
Ding Zhao
LM&Ro
24
7
0
09 Jun 2023
Modular Visual Question Answering via Code Generation
Modular Visual Question Answering via Code Generation
Sanjay Subramanian
Medhini Narasimhan
Kushal Khangaonkar
Kevin Kaichuang Yang
Arsha Nagrani
Cordelia Schmid
Andy Zeng
Trevor Darrell
Dan Klein
29
46
0
08 Jun 2023
Deductive Verification of Chain-of-Thought Reasoning
Deductive Verification of Chain-of-Thought Reasoning
Z. Ling
Yunhao Fang
Xuanlin Li
Zhiao Huang
Mingu Lee
Roland Memisevic
Hao Su
ReLM
LRM
32
125
0
06 Jun 2023
Human-like Few-Shot Learning via Bayesian Reasoning over Natural
  Language
Human-like Few-Shot Learning via Bayesian Reasoning over Natural Language
Kevin Ellis
BDL
LRM
21
16
0
05 Jun 2023
MetaVL: Transferring In-Context Learning Ability From Language Models to
  Vision-Language Models
MetaVL: Transferring In-Context Learning Ability From Language Models to Vision-Language Models
Masoud Monajatipoor
Liunian Harold Li
Mozhdeh Rouhsedaghat
Lin F. Yang
Kai-Wei Chang
MLLM
LRM
19
12
0
02 Jun 2023
Reimagining Retrieval Augmented Language Models for Answering Queries
Reimagining Retrieval Augmented Language Models for Answering Queries
W. Tan
Yuliang Li
Pedro Rodriguez
Rich James
Xi Lin
A. Halevy
Scott Yih
KELM
LRM
37
9
0
01 Jun 2023
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented
  Language Model Prompting
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting
R. Ramos
Bruno Martins
Desmond Elliott
VLM
13
16
0
31 May 2023
Enhanced Chart Understanding in Vision and Language Task via Cross-modal
  Pre-training on Plot Table Pairs
Enhanced Chart Understanding in Vision and Language Task via Cross-modal Pre-training on Plot Table Pairs
Mingyang Zhou
Yi Ren Fung
Long Chen
Christopher Thomas
Heng Ji
Shih-Fu Chang
23
11
0
29 May 2023
Previous
123456789
Next