Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.00598
Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"
50 / 443 papers shown
Title
The Contribution of Knowledge in Visiolinguistic Learning: A Survey on Tasks and Challenges
Maria Lymperaiou
Giorgos Stamou
VLM
32
4
0
04 Mar 2023
Grounded Decoding: Guiding Text Generation with Grounded Models for Embodied Agents
Wenlong Huang
Fei Xia
Dhruv Shah
Danny Driess
Andy Zeng
...
Pete Florence
Igor Mordatch
Sergey Levine
Karol Hausman
Brian Ichter
LM&Ro
27
42
0
01 Mar 2023
Semantic Mechanical Search with Large Vision and Language Models
Satvik Sharma
Huang Huang
K. Shivakumar
A. Imran
Ryan Hoque
Brian Ichter
Ken Goldberg
LM&Ro
VLM
29
5
0
24 Feb 2023
ChatGPT for Robotics: Design Principles and Model Abilities
Sai H. Vemprala
Rogerio Bonatti
A. Bucker
Ashish Kapoor
LM&Ro
33
459
0
20 Feb 2023
Prompting Large Language Models With the Socratic Method
Edward Y. Chang
LRM
ELM
56
48
0
17 Feb 2023
Complex QA and language models hybrid architectures, Survey
Xavier Daull
P. Bellot
Emmanuel Bruno
Vincent Martin
Elisabeth Murisasco
ELM
28
15
0
17 Feb 2023
Augmented Language Models: a Survey
Grégoire Mialon
Roberto Dessì
Maria Lomeli
Christoforos Nalmpantis
Ramakanth Pasunuru
...
Jane Dwivedi-Yu
Asli Celikyilmaz
Edouard Grave
Yann LeCun
Thomas Scialom
LRM
KELM
47
367
0
15 Feb 2023
Differentiable Outlier Detection Enable Robust Deep Multimodal Analysis
Zhu Wang
Sourav Medya
Sathya Ravi
VLM
28
0
0
11 Feb 2023
SOCRATES: Text-based Human Search and Approach using a Robot Dog
Jeongeun Park
Jefferson Silveria
Matthew K. X. J. Pan
Sungjoon Choi
16
0
0
10 Feb 2023
Prompting for Multimodal Hateful Meme Classification
Rui Cao
Roy Ka-Wei Lee
Wen-Haw Chong
Jing Jiang
VLM
22
75
0
08 Feb 2023
CHiLS: Zero-Shot Image Classification with Hierarchical Label Sets
Zachary Novack
Julian McAuley
Zachary Chase Lipton
Saurabh Garg
VLM
35
79
0
06 Feb 2023
LaMPP: Language Models as Probabilistic Priors for Perception and Action
Belinda Z. Li
William Chen
Pratyusha Sharma
Jacob Andreas
24
15
0
03 Feb 2023
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
Zihao Wang
Shaofei Cai
Guanzhou Chen
Guy Van den Broeck
Xiaojian Ma
Yitao Liang
LM&Ro
LLMAG
60
315
0
03 Feb 2023
IC3: Image Captioning by Committee Consensus
David M. Chan
Austin Myers
Sudheendra Vijayanarasimhan
David A. Ross
John F. Canny
32
17
0
02 Feb 2023
Distilling Internet-Scale Vision-Language Models into Embodied Agents
T. Sumers
Kenneth Marino
Arun Ahuja
Rob Fergus
Ishita Dasgupta
LM&Ro
35
24
0
29 Jan 2023
Affective Faces for Goal-Driven Dyadic Communication
Scott Geng
Revant Teotia
Purva Tendulkar
Sachit Menon
Carl Vondrick
VGen
26
18
0
26 Jan 2023
Transfer Knowledge from Natural Language to Electrocardiography: Can We Detect Cardiovascular Disease Through Language Models?
Jielin Qiu
William Jongwon Han
Jiacheng Zhu
Mengdi Xu
Michael A. Rosenberg
Emerson Liu
Douglas Weber
Ding Zhao
32
21
0
21 Jan 2023
See, Think, Confirm: Interactive Prompting Between Vision and Language Models for Knowledge-based Visual Reasoning
Zhenfang Chen
Qinhong Zhou
Yikang Shen
Yining Hong
Hao Zhang
Chuang Gan
LRM
VLM
33
35
0
12 Jan 2023
Test of Time: Instilling Video-Language Models with a Sense of Time
Piyush Bagad
Makarand Tapaswi
Cees G. M. Snoek
83
36
0
05 Jan 2023
Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning
Woohyun Kang
Jonghwan Mun
Sungjun Lee
Byungseok Roh
VLM
11
18
0
27 Dec 2022
Contrastive Distillation Is a Sample-Efficient Self-Supervised Loss Policy for Transfer Learning
Christopher T. Lengerich
Gabriel Synnaeve
Amy Zhang
Hugh Leather
Kurt Shuster
Franccois Charton
Charysse Redwood
SSL
OffRL
27
1
0
21 Dec 2022
DePlot: One-shot visual language reasoning by plot-to-table translation
Fangyu Liu
Julian Martin Eisenschlos
Francesco Piccinno
Syrine Krichene
Chenxi Pang
Kenton Lee
Mandar Joshi
Wenhu Chen
Nigel Collier
Yasemin Altun
VLM
ReLM
LRM
27
89
0
20 Dec 2022
Manifestations of Xenophobia in AI Systems
Nenad Tomašev
J. L. Maynard
Iason Gabriel
24
9
0
15 Dec 2022
Doubly Right Object Recognition: A Why Prompt for Visual Rationales
Chengzhi Mao
Revant Teotia
Amrutha Sundar
Sachit Menon
Junfeng Yang
Xin Eric Wang
Carl Vondrick
18
29
0
12 Dec 2022
VideoCoCa: Video-Text Modeling with Zero-Shot Transfer from Contrastive Captioners
Shen Yan
Tao Zhu
Zirui Wang
Yuan Cao
Mi Zhang
Soham Ghosh
Yonghui Wu
Jiahui Yu
VLM
VGen
32
46
0
09 Dec 2022
Learning Video Representations from Large Language Models
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
28
165
0
08 Dec 2022
I2MVFormer: Large Language Model Generated Multi-View Document Supervision for Zero-Shot Image Classification
Muhammad Ferjad Naeem
Muhammad Gul Zain Ali Khan
Yongqin Xian
Muhammad Zeshan Afzal
D. Stricker
Luc Van Gool
F. Tombari
VLM
35
51
0
05 Dec 2022
Improving Commonsense in Vision-Language Models via Knowledge Graph Riddles
Shuquan Ye
Yujia Xie
Dongdong Chen
Yichong Xu
Lu Yuan
Chenguang Zhu
Jing Liao
VLM
24
11
0
29 Nov 2022
Peekaboo: Text to Image Diffusion Models are Zero-Shot Segmentors
R. Burgert
Kanchana Ranasinghe
Xiang Li
Michael S. Ryoo
DiffM
VLM
34
37
0
23 Nov 2022
Visual Programming: Compositional visual reasoning without training
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
88
402
0
18 Nov 2022
InstructPix2Pix: Learning to Follow Image Editing Instructions
Tim Brooks
Aleksander Holynski
Alexei A. Efros
DiffM
72
1,699
0
17 Nov 2022
I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision
Sophia Gu
Christopher Clark
Aniruddha Kembhavi
VLM
19
24
0
17 Nov 2022
PromptCap: Prompt-Guided Task-Aware Image Captioning
Yushi Hu
Hang Hua
Zhengyuan Yang
Weijia Shi
Noah A. Smith
Jiebo Luo
51
101
0
15 Nov 2022
PARTNR: Pick and place Ambiguity Resolving by Trustworthy iNteractive leaRning
Jelle Luijkx
Zlatan Ajanović
L. Ferranti
Jens Kober
15
3
0
15 Nov 2022
Zero-shot Image Captioning by Anchor-augmented Vision-Language Space Alignment
Junyan Wang
Yi Zhang
Ming Yan
Ji Zhang
Jitao Sang
VLM
31
9
0
14 Nov 2022
What is Wrong with Language Models that Can Not Tell a Story?
Ivan P. Yamshchikov
Alexey Tikhonov
22
6
0
09 Nov 2022
Learning to Solve Voxel Building Embodied Tasks from Pixels and Natural Language Instructions
Alexey Skrynnik
Zoya Volovikova
Marc-Alexandre Côté
Anton Voronov
Artem Zholus
...
Milagro Teruel
Ahmed Hassan Awadallah
Aleksandr I. Panov
Andrey Kravchenko
Julia Kiseleva
LM&Ro
51
11
0
01 Nov 2022
Open-vocabulary Semantic Segmentation with Frozen Vision-Language Models
Chaofan Ma
Yu-Hao Yang
Yanfeng Wang
Ya-Qin Zhang
Weidi Xie
VLM
26
48
0
27 Oct 2022
A Case for Business Process-Specific Foundation Models
Yara Rizk
Praveen Venkateswaran
Vatche Isahagian
Vinod Muthusamy
AI4CE
31
9
0
26 Oct 2022
IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
Seungwhan Moon
Andrea Madotto
Zhaojiang Lin
Alireza Dirafzoon
Aparajita Saraf
Amy Bearman
Babak Damavandi
VLM
20
36
0
26 Oct 2022
Instruction-Following Agents with Multimodal Transformer
Hao Liu
Lisa Lee
Kimin Lee
Pieter Abbeel
LM&Ro
35
10
0
24 Oct 2022
Composing Ensembles of Pre-trained Models via Iterative Consensus
Shuang Li
Yilun Du
J. Tenenbaum
Antonio Torralba
Igor Mordatch
MoMe
19
23
0
20 Oct 2022
Communication breakdown: On the low mutual intelligibility between human and neural captioning
Roberto Dessì
Eleonora Gualdoni
Francesca Franzon
Gemma Boleda
Marco Baroni
VLM
32
6
0
20 Oct 2022
Plug-and-Play VQA: Zero-shot VQA by Conjoining Large Pretrained Models with Zero Training
A. M. H. Tiong
Junnan Li
Boyang Albert Li
Silvio Savarese
S. Hoi
MLLM
27
101
0
17 Oct 2022
Visual Classification via Description from Large Language Models
Sachit Menon
Carl Vondrick
VLM
35
287
0
13 Oct 2022
Retrospectives on the Embodied AI Workshop
Matt Deitke
Dhruv Batra
Yonatan Bisk
Tommaso Campari
Angel X. Chang
...
Jesse Thomason
Alexander Toshev
Joanne Truong
Luca Weihs
Jiajun Wu
LM&Ro
37
51
0
13 Oct 2022
Visual Language Maps for Robot Navigation
Chen Huang
Oier Mees
Andy Zeng
Wolfram Burgard
LM&Ro
162
344
0
11 Oct 2022
Using Both Demonstrations and Language Instructions to Efficiently Learn Robotic Tasks
Albert Yu
Raymond J. Mooney
LM&Ro
32
19
0
10 Oct 2022
VIMA: General Robot Manipulation with Multimodal Prompts
Yunfan Jiang
Agrim Gupta
Zichen Zhang
Guanzhi Wang
Yongqiang Dou
Yanjun Chen
Li Fei-Fei
Anima Anandkumar
Yuke Zhu
Linxi Fan
LM&Ro
28
335
0
06 Oct 2022
Grounding Language with Visual Affordances over Unstructured Data
Oier Mees
Jessica Borja-Diaz
Wolfram Burgard
LM&Ro
121
108
0
04 Oct 2022
Previous
1
2
3
4
5
6
7
8
9
Next