ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

50 / 443 papers shown
Title
Visual AI and Linguistic Intelligence Through Steerability and
  Composability
Visual AI and Linguistic Intelligence Through Steerability and Composability
David A. Noever
S. M. Noever
42
0
0
18 Nov 2023
Challenges in data-based geospatial modeling for environmental research
  and practice
Challenges in data-based geospatial modeling for environmental research and practice
Diana Koldasbayeva
P. Tregubova
M. Gasanov
Alexey Zaytsev
Anna Petrovskaia
E. Burnaev
AI4CE
32
1
0
18 Nov 2023
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation
  via Language Corrections
Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections
Lihan Zha
Yuchen Cui
Li-Heng Lin
Minae Kwon
Montse Gonzalez Arenas
Andy Zeng
Fei Xia
Dorsa Sadigh
35
36
0
17 Nov 2023
VideoCon: Robust Video-Language Alignment via Contrast Captions
VideoCon: Robust Video-Language Alignment via Contrast Captions
Hritik Bansal
Yonatan Bitton
Idan Szpektor
Kai-Wei Chang
Aditya Grover
40
14
0
15 Nov 2023
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in
  Social Robots
I Was Blind but Now I See: Implementing Vision-Enabled Dialogue in Social Robots
Giulio Antonio Abbo
Tony Belpaeme
21
1
0
15 Nov 2023
Zero-shot audio captioning with audio-language model guidance and audio
  context keywords
Zero-shot audio captioning with audio-language model guidance and audio context keywords
Leonard Salewski
Stefan Fauth
A. Sophia Koepke
Zeynep Akata
26
10
0
14 Nov 2023
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Human-Centric Autonomous Systems With LLMs for User Command Reasoning
Yi Yang
Qingwen Zhang
Ci Li
Daniel Simoes Marta
Nazre Batool
John Folkesson
LRM
67
29
0
14 Nov 2023
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal
  Language Models
JARVIS-1: Open-World Multi-task Agents with Memory-Augmented Multimodal Language Models
Zihao Wang
Shaofei Cai
Guy Van den Broeck
Yonggang Jin
Jinbing Hou
...
Zhaofeng He
Zilong Zheng
Yaodong Yang
Xiaojian Ma
Yitao Liang
LLMAG
LM&Ro
36
96
0
10 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
71
4
0
10 Nov 2023
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities
  for Image Classification
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image Classification
Reza Esfandiarpoor
Stephen H. Bach
VLM
32
13
0
10 Nov 2023
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in
  Clutter
Language-guided Robot Grasping: CLIP-based Referring Grasp Synthesis in Clutter
Georgios Tziafas
Yucheng Xu
Arushi Goel
M. Kasaei
Zhibin Li
H. Kasaei
32
23
0
09 Nov 2023
Zero-shot Translation of Attention Patterns in VQA Models to Natural
  Language
Zero-shot Translation of Attention Patterns in VQA Models to Natural Language
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
34
2
0
08 Nov 2023
Multitask Multimodal Prompted Training for Interactive Embodied Task
  Completion
Multitask Multimodal Prompted Training for Interactive Embodied Task Completion
Georgios Pantazopoulos
Malvina Nikandrou
Amit Parekh
Bhathiya Hemanthage
Arash Eshghi
Ioannis Konstas
Verena Rieser
Oliver Lemon
Alessandro Suglia
LM&Ro
36
7
0
07 Nov 2023
OmniVec: Learning robust representations with cross modal sharing
OmniVec: Learning robust representations with cross modal sharing
Siddharth Srivastava
Gaurav Sharma
SSL
29
64
0
07 Nov 2023
Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close
  the Healthcare Loop
Get the Ball Rolling: Alerting Autonomous Robots When to Help to Close the Healthcare Loop
Jiaxin Shen
Yanyao Liu
Ziming Wang
Ziyuan Jiao
Yufeng Chen
Wenjuan Han
20
0
0
05 Nov 2023
Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools
Make a Donut: Hierarchical EMD-Space Planning for Zero-Shot Deformable Manipulation with Tools
Yang You
Bokui Shen
Congyue Deng
Haoran Geng
Songlin Wei
He-Nan Wang
Leonidas J. Guibas
26
1
0
05 Nov 2023
Sentiment Analysis through LLM Negotiations
Sentiment Analysis through LLM Negotiations
Xiaofei Sun
Xiaoya Li
Shengyu Zhang
Shuhe Wang
Fei Wu
Jiwei Li
Tianwei Zhang
Guoyin Wang
32
16
0
03 Nov 2023
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
Nicholas Walker
Stefan Ultes
Pierre Lison
LM&Ro
56
1
0
03 Nov 2023
Long Story Short: a Summarize-then-Search Method for Long Video Question
  Answering
Long Story Short: a Summarize-then-Search Method for Long Video Question Answering
Jiwan Chung
Youngjae Yu
100
5
0
02 Nov 2023
Is GPT Powerful Enough to Analyze the Emotions of Memes?
Is GPT Powerful Enough to Analyze the Emotions of Memes?
Jingjing Wang
Joshua Luo
Grace Yang
Allen Hong
Feng Luo
ELM
AI4MH
32
1
0
01 Nov 2023
Large Language Models as Generalizable Policies for Embodied Tasks
Large Language Models as Generalizable Policies for Embodied Tasks
Andrew Szot
Max Schwarzer
Harsh Agrawal
Bogdan Mazoure
Walter A. Talbott
Katherine Metcalf
Natalie Mackraz
Devon Hjelm
Alexander Toshev
LM&Ro
34
58
0
26 Oct 2023
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Apollo: Zero-shot MultiModal Reasoning with Multiple Experts
Daniela Ben-David
Tzuf Paz-Argaman
Reut Tsarfaty
MoE
26
0
0
25 Oct 2023
Woodpecker: Hallucination Correction for Multimodal Large Language
  Models
Woodpecker: Hallucination Correction for Multimodal Large Language Models
Shukang Yin
Chaoyou Fu
Sirui Zhao
Tong Xu
Hao Wang
Dianbo Sui
Yunhang Shen
Ke Li
Xingguo Sun
Enhong Chen
VLM
MLLM
38
114
0
24 Oct 2023
Unnatural language processing: How do language models handle
  machine-generated prompts?
Unnatural language processing: How do language models handle machine-generated prompts?
Corentin Kervadec
Francesca Franzon
Marco Baroni
23
5
0
24 Oct 2023
Large Language Models are Visual Reasoning Coordinators
Large Language Models are Visual Reasoning Coordinators
Liangyu Chen
Bo Li
Sheng Shen
Jingkang Yang
Chunyuan Li
Kurt Keutzer
Trevor Darrell
Ziwei Liu
VLM
LRM
41
48
0
23 Oct 2023
Open-Ended Instructable Embodied Agents with Memory-Augmented Large
  Language Models
Open-Ended Instructable Embodied Agents with Memory-Augmented Large Language Models
Gabriel H. Sarch
Yue Wu
Michael J. Tarr
Katerina Fragkiadaki
LM&Ro
LLMAG
27
19
0
23 Oct 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLM
MLLM
36
155
0
23 Oct 2023
Can Language Models Laugh at YouTube Short-form Videos?
Can Language Models Laugh at YouTube Short-form Videos?
Dayoon Ko
Sangho Lee
Gunhee Kim
36
6
0
22 Oct 2023
3D-GPT: Procedural 3D Modeling with Large Language Models
3D-GPT: Procedural 3D Modeling with Large Language Models
Chunyi Sun
Junlin Han
Weijian Deng
Xinlong Wang
Zishan Qin
Stephen Gould
39
39
0
19 Oct 2023
Language Models as Zero-Shot Trajectory Generators
Language Models as Zero-Shot Trajectory Generators
Teyun Kwon
Norman Di Palo
Edward Johns
LM&Ro
27
45
0
17 Oct 2023
Video Language Planning
Video Language Planning
Yilun Du
Mengjiao Yang
Peter R. Florence
Fei Xia
Ayzaan Wahid
...
Pieter Abbeel
Josh Tenenbaum
L. Kaelbling
Andy Zeng
Jonathan Tompson
PINN
LM&Ro
96
85
0
16 Oct 2023
Interpreting and Controlling Vision Foundation Models via Text
  Explanations
Interpreting and Controlling Vision Foundation Models via Text Explanations
Haozhe Chen
Junfeng Yang
Carl Vondrick
Chengzhi Mao
24
2
0
16 Oct 2023
VidCoM: Fast Video Comprehension through Large Language Models with
  Multimodal Tools
VidCoM: Fast Video Comprehension through Large Language Models with Multimodal Tools
Huihui Gong
Minjing Dong
Siqi Ma
S. Çamtepe
Chang Xu
Lei Hou
Surya Nepal
VLM
MLLM
55
0
0
16 Oct 2023
Reading Books is Great, But Not if You Are Driving! Visually Grounded
  Reasoning about Defeasible Commonsense Norms
Reading Books is Great, But Not if You Are Driving! Visually Grounded Reasoning about Defeasible Commonsense Norms
Seungju Han
Junhyeok Kim
Jack Hessel
Liwei Jiang
Jiwan Chung
Yejin Son
Yejin Choi
Youngjae Yu
18
2
0
16 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
117
0
16 Oct 2023
Interactive Task Planning with Language Models
Interactive Task Planning with Language Models
Boyi Li
Philipp Wu
Pieter Abbeel
Jitendra Malik
LM&Ro
36
33
0
16 Oct 2023
VLIS: Unimodal Language Models Guide Multimodal Language Generation
VLIS: Unimodal Language Models Guide Multimodal Language Generation
Jiwan Chung
Youngjae Yu
VLM
30
1
0
15 Oct 2023
Vision-by-Language for Training-Free Compositional Image Retrieval
Vision-by-Language for Training-Free Compositional Image Retrieval
Shyamgopal Karthik
Karsten Roth
Massimiliano Mancini
Zeynep Akata
CoGe
28
52
0
13 Oct 2023
Tree-Planner: Efficient Close-loop Task Planning with Large Language
  Models
Tree-Planner: Efficient Close-loop Task Planning with Large Language Models
Mengkang Hu
Yao Mu
Xinmiao Yu
Mingyu Ding
Shiguang Wu
Wenqi Shao
Qiguang Chen
Bin Wang
Yu Qiao
Ping Luo
LLMAG
42
33
0
12 Oct 2023
Jigsaw: Supporting Designers to Prototype Multimodal Applications by
  Chaining AI Foundation Models
Jigsaw: Supporting Designers to Prototype Multimodal Applications by Chaining AI Foundation Models
David Chuan-En Lin
Nikolas Martelaro
24
18
0
12 Oct 2023
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Open-Set Knowledge-Based Visual Question Answering with Inference Paths
Jingru Gan
Xinzhe Han
Shuhui Wang
Qingming Huang
36
0
0
12 Oct 2023
A Closer Look into Automatic Evaluation Using Large Language Models
A Closer Look into Automatic Evaluation Using Large Language Models
Cheng-Han Chiang
Hunghuei Lee
ELM
ALM
LM&MA
35
13
0
09 Oct 2023
Compositional Semantics for Open Vocabulary Spatio-semantic
  Representations
Compositional Semantics for Open Vocabulary Spatio-semantic Representations
Robin Karlsson
Francisco Lepe-Salazar
K. Takeda
VLM
53
1
0
08 Oct 2023
GRID: A Platform for General Robot Intelligence Development
GRID: A Platform for General Robot Intelligence Development
Sai H. Vemprala
Shuhang Chen
Abhinav Shukla
Dinesh Narayanan
Ashish Kapoor
25
10
0
02 Oct 2023
Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large
  Language Models
Cook2LTL: Translating Cooking Recipes to LTL Formulae using Large Language Models
A. Mavrogiannis
Christoforos Mavrogiannis
Yiannis Aloimonos
LM&Ro
15
10
0
29 Sep 2023
OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language
OceanChat: Piloting Autonomous Underwater Vehicles in Natural Language
Jia Huang
Mengxue Hou
Junkai Wang
Fumin Zhang
40
5
0
27 Sep 2023
Lifelong Robot Learning with Human Assisted Language Planners
Lifelong Robot Learning with Human Assisted Language Planners
Meenal Parakh
Alisha Fong
Anthony Simeonov
Tao Chen
Abhishek Gupta
Pulkit Agrawal
LM&Ro
39
14
0
25 Sep 2023
ReConcile: Round-Table Conference Improves Reasoning via Consensus among
  Diverse LLMs
ReConcile: Round-Table Conference Improves Reasoning via Consensus among Diverse LLMs
Justin Chih-Yao Chen
Swarnadeep Saha
Joey Tianyi Zhou
LLMAG
LRM
40
120
0
22 Sep 2023
LMC: Large Model Collaboration with Cross-assessment for Training-Free
  Open-Set Object Recognition
LMC: Large Model Collaboration with Cross-assessment for Training-Free Open-Set Object Recognition
Haoxuan Qu
Xiaofei Hui
Yujun Cai
Jun Liu
49
10
0
22 Sep 2023
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language
  Model as an Agent
LLM-Grounder: Open-Vocabulary 3D Visual Grounding with Large Language Model as an Agent
Jianing Yang
Xuweiyi Chen
Shengyi Qian
Nikhil Madaan
Madhavan Iyengar
David Fouhey
Joyce Chai
LM&Ro
LLMAG
43
84
0
21 Sep 2023
Previous
123456789
Next