Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2204.00598
Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
ReLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"
50 / 443 papers shown
Title
Embodied AI in Machine Learning -- is it Really Embodied?
Matej Hoffmann
Shubhan Patni
LM&Ro
AI4CE
19
0
0
15 May 2025
ReLI: A Language-Agnostic Approach to Human-Robot Interaction
Linus Nwankwo
Bjoern Ellensohn
Ozan Özdenizci
Elmar Rueckert
LM&Ro
58
0
0
03 May 2025
Chain-of-Modality: Learning Manipulation Programs from Multimodal Human Videos with Vision-Language-Models
Chen Wang
Fei Xia
Wenhao Yu
Tingnan Zhang
Ruohan Zhang
Ce Liu
Li Fei-Fei
Jie Tan
Jacky Liang
36
0
0
17 Apr 2025
How Can Objects Help Video-Language Understanding?
Zitian Tang
Shijie Wang
Junho Cho
Jaewook Yoo
Chen Sun
45
0
0
10 Apr 2025
Vision-Language Model Predictive Control for Manipulation Planning and Trajectory Generation
Jiaming Chen
Wentao Zhao
Ziyu Meng
Donghui Mao
Ran Song
Wei Pan
Wei Zhang
33
0
0
07 Apr 2025
Debate-Feedback: A Multi-Agent Framework for Efficient Legal Judgment Prediction
Xi Chen
Mao Mao
Shuo Li
Haotian Shangguan
LLMAG
AILaw
ELM
81
0
0
07 Apr 2025
Chapter-Llama: Efficient Chaptering in Hour-Long Videos with LLMs
Lucas Ventura
Antoine Yang
Cordelia Schmid
Gül Varol
39
0
0
31 Mar 2025
Cooking Task Planning using LLM and Verified by Graph Network
Ryunosuke Takebayashi
V. H. Isume
Takuya Kiyokawa
Weiwei Wan
Kensuke Harada
66
0
0
27 Mar 2025
FALCONEye: Finding Answers and Localizing Content in ONE-hour-long videos with multi-modal LLMs
Carlos Plou
Cesar Borja
Ruben Martinez-Cantin
Ana C. Murillo
61
0
0
25 Mar 2025
Unbiasing through Textual Descriptions: Mitigating Representation Bias in Video Benchmarks
Nina Shvetsova
Arsha Nagrani
Bernt Schiele
Hilde Kuehne
Christian Rupprecht
53
0
0
24 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
45
0
0
22 Mar 2025
EmbodiedVSR: Dynamic Scene Graph-Guided Chain-of-Thought Reasoning for Visual Spatial Tasks
Yi Zhang
Qiang Zhang
Xiaozhu Ju
Ziqiang Liu
Jilei Mao
...
Jiaxu Wang
Yiqun Duan
Jiahang Cao
Renjing Xu
Jian Tang
LM&Ro
LRM
62
0
0
14 Mar 2025
Towards Fast, Memory-based and Data-Efficient Vision-Language Policy
Haoxuan Li
Sixu Yan
Yicong Li
Xinggang Wang
LM&Ro
64
0
0
13 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
58
0
0
12 Mar 2025
Generating Robot Constitutions & Benchmarks for Semantic Safety
P. Sermanet
Anirudha Majumdar
A. Irpan
Dmitry Kalashnikov
Vikas Sindhwani
LM&Ro
60
1
0
11 Mar 2025
Investigating the Effectiveness of a Socratic Chain-of-Thoughts Reasoning Method for Task Planning in Robotics, A Case Study
Veronica Bot
Zheyuan Xu
LRM
LLMAG
LM&Ro
67
0
0
11 Mar 2025
LTLCodeGen: Code Generation of Syntactically Correct Temporal Logic for Robot Task Planning
Behrad Rabiei
Mahesh Kumar A.R.
Zhirui Dai
Surya L.S.R. Pilla
Qiyue Dong
Nikolay Atanasov
LM&Ro
61
0
0
10 Mar 2025
Alignment for Efficient Tool Calling of Large Language Models
Hongshen Xu
Zihan Wang
Zichen Zhu
Lei Pan
Xingyu Chen
Lu Chen
Kai Yu
49
0
0
09 Mar 2025
CLEA: Closed-Loop Embodied Agent for Enhancing Task Execution in Dynamic Environments
Mingcong Lei
Ge Wang
Yiming Zhao
Zhixin Mai
Qing Zhao
Yao Guo
Zhen Li
Shuguang Cui
Yatong Han
J. Ren
LLMAG
43
0
0
02 Mar 2025
Multi-Agent Verification: Scaling Test-Time Compute with Multiple Verifiers
Shalev Lifshitz
Sheila A. McIlraith
Yilun Du
LRM
55
5
0
27 Feb 2025
Less or More: Towards Glanceable Explanations for LLM Recommendations Using Ultra-Small Devices
Xinru Wang
Mengjie Yu
Hannah Nguyen
Michael Iuzzolino
Tianyi Wang
...
Ting Zhang
Naveen Sendhilnathan
Hrvoje Benko
Haijun Xia
Tanya R. Jonker
53
0
0
26 Feb 2025
Beyond Pattern Recognition: Probing Mental Representations of LMs
Moritz Miller
Kumar Shridhar
ReLM
LRM
55
0
0
23 Feb 2025
Demystifying Hateful Content: Leveraging Large Multimodal Models for Hateful Meme Detection with Explainable Decisions
Ming Shan Hee
Roy Ka-Wei Lee
VLM
83
0
0
16 Feb 2025
A Real-to-Sim-to-Real Approach to Robotic Manipulation with VLM-Generated Iterative Keypoint Rewards
Shivansh Patel
Xinchen Yin
Wenlong Huang
Shubham Garg
H. Nayyeri
Li Fei-Fei
Svetlana Lazebnik
Yicong Li
92
0
0
12 Feb 2025
Robust Mobile Robot Path Planning via LLM-Based Dynamic Waypoint Generation
Muhammad Taha Tariq
Congqing Wang
Yasir Hussain
89
0
0
28 Jan 2025
LAMS: LLM-Driven Automatic Mode Switching for Assistive Teleoperation
Yiran Tao
Jehan Yang
Dan Ding
Zackory Erickson
36
0
0
15 Jan 2025
RoboHorizon: An LLM-Assisted Multi-View World Model for Long-Horizon Robotic Manipulation
Zixuan Chen
Jing Huo
Yangtao Chen
Yang Gao
43
2
0
11 Jan 2025
Using Pre-trained LLMs for Multivariate Time Series Forecasting
Malcolm Wolff
Shenghao Yang
Kari Torkkola
Michael W. Mahoney
AI4TS
AIFin
46
1
0
10 Jan 2025
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
79
12
0
03 Jan 2025
LLM+AL: Bridging Large Language Models and Action Languages for Complex Reasoning about Actions
Adam Ishay
Joohyung Lee
LRM
37
1
0
01 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
63
24
0
31 Dec 2024
Multi-Modal Grounded Planning and Efficient Replanning For Learning Embodied Agents with A Few Examples
Taewoong Kim
Byeonghwi Kim
Jonghyun Choi
LLMAG
LM&Ro
49
1
0
23 Dec 2024
A Review of Multimodal Explainable Artificial Intelligence: Past, Present and Future
Shilin Sun
Wenbin An
Feng Tian
Fang Nan
Qidong Liu
Xiaozhong Liu
N. Shah
Ping Chen
96
2
0
18 Dec 2024
CAD-Assistant: Tool-Augmented VLLMs as Generic CAD Task Solvers
Dimitrios Mallis
Ahmet Serdar Karadeniz
Sebastian Cavada
Danila Rukhovich
Niki Maria Foteinopoulou
K. Cherenkova
Anis Kacem
Djamila Aouada
79
2
0
18 Dec 2024
DEFAME: Dynamic Evidence-based FAct-checking with Multimodal Experts
Tobias Braun
Mark Rothermel
Marcus Rohrbach
Anna Rohrbach
87
1
0
13 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
103
4
0
12 Dec 2024
ShowUI: One Vision-Language-Action Model for GUI Visual Agent
Kevin Qinghong Lin
Linjie Li
Difei Gao
Zhengyuan Yang
Shiwei Wu
Zechen Bai
Weixian Lei
Lijuan Wang
Mike Zheng Shou
LLMAG
74
13
0
26 Nov 2024
I Can Tell What I am Doing: Toward Real-World Natural Language Grounding of Robot Experiences
Zihan Wang
Brian Liang
Varad Dhat
Zander Brumbaugh
Nick Walker
Ranjay Krishna
Maya Cakmak
61
4
0
20 Nov 2024
HourVideo: 1-Hour Video-Language Understanding
Keshigeyan Chandrasegaran
Agrim Gupta
Lea M. Hadzic
Taran Kota
Jimming He
Cristobal Eyzaguirre
Zane Durante
Manling Li
Jiajun Wu
L. Fei-Fei
VLM
48
31
0
07 Nov 2024
Personalized Video Summarization by Multimodal Video Understanding
Brian Chen
Xiangyuan Zhao
Yingnan Zhu
41
1
0
05 Nov 2024
TI-PREGO: Chain of Thought and In-Context Learning for Online Mistake Detection in PRocedural EGOcentric Videos
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Guido Maria DÁmely di Melendugno
Alessandro Flaborea
Andrea Sanchietti
G. Farinella
Fabio Galasso
Antonino Furnari
EgoV
LRM
48
1
0
04 Nov 2024
Thinking Forward and Backward: Effective Backward Planning with Large Language Models
Allen Z. Ren
Brian Ichter
Anirudha Majumdar
LLMAG
LRM
33
0
0
04 Nov 2024
Multilingual Vision-Language Pre-training for the Remote Sensing Domain
João Daniel Silva
João Magalhães
D. Tuia
Bruno Martins
CLIP
VLM
42
1
0
30 Oct 2024
ProMQA: Question Answering Dataset for Multimodal Procedural Activity Understanding
Kimihiro Hasegawa
Wiradee Imrattanatrai
Zhi-Qi Cheng
Masaki Asada
Susan Holm
Yuran Wang
Ken Fukuda
Teruko Mitamura
28
0
0
29 Oct 2024
SegLLM: Multi-round Reasoning Segmentation
XuDong Wang
Shaolun Zhang
Shufan Li
Konstantinos Kallidromitis
Kehan Li
Yusuke Kato
Kazuki Kozuka
Trevor Darrell
VLM
LRM
50
1
0
24 Oct 2024
Foundation Models for Rapid Autonomy Validation
Alec Farid
Peter Schleede
Aaron Huang
Christoffer Heckman
43
0
0
22 Oct 2024
In-Context Learning Enables Robot Action Prediction in LLMs
Yida Yin
Zekai Wang
Yuvan Sharma
Dantong Niu
Trevor Darrell
Roei Herzig
LM&Ro
117
1
0
16 Oct 2024
Innovative Thinking, Infinite Humor: Humor Research of Large Language Models through Structured Thought Leaps
Han Wang
Yilin Zhao
Dian Li
Xiaohan Wang
Gang Liu
Xuguang Lan
Haoran Wang
LRM
45
1
0
14 Oct 2024
ImagineNav: Prompting Vision-Language Models as Embodied Navigator through Scene Imagination
Xinxin Zhao
Wenzhe Cai
Likun Tang
Teng Wang
LM&Ro
40
3
0
13 Oct 2024
Language-Model-Assisted Bi-Level Programming for Reward Learning from Internet Videos
Harsh Mahesheka
Zhixian Xie
Zhilin Wang
Wanxin Jin
29
0
0
11 Oct 2024
1
2
3
4
5
6
7
8
9
Next