ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2204.00598
  4. Cited By
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language

1 April 2022
Andy Zeng
Maria Attarian
Brian Ichter
K. Choromanski
Adrian S. Wong
Stefan Welker
F. Tombari
Aveek Purohit
Michael S. Ryoo
Vikas Sindhwani
Johnny Lee
Vincent Vanhoucke
Peter R. Florence
    ReLM
    LRM
ArXivPDFHTML

Papers citing "Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language"

50 / 443 papers shown
Title
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric
  Representation Guided LLM Reasoning
Aerial Vision-and-Language Navigation via Semantic-Topo-Metric Representation Guided LLM Reasoning
Yunpeng Gao
Zhigang Wang
Linglin Jing
Dong Wang
Xuelong Li
Bin Zhao
38
14
0
11 Oct 2024
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for
  Robust Task Planning and Execution
ConceptAgent: LLM-Driven Precondition Grounding and Tree Search for Robust Task Planning and Execution
Corban Rivera
Grayson Byrd
William Paul
Tyler Feldman
Meghan Booker
...
Krishna Murthy Jatavallabhula
Celso M. De Melo
Lalithkumar Seenivasan
Mathias Unberath
Rama Chellappa
LLMAG
LM&Ro
31
0
0
08 Oct 2024
LADEV: A Language-Driven Testing and Evaluation Platform for
  Vision-Language-Action Models in Robotic Manipulation
LADEV: A Language-Driven Testing and Evaluation Platform for Vision-Language-Action Models in Robotic Manipulation
Zhijie Wang
Zhehua Zhou
Jiayang Song
Yuheng Huang
Zhan Shu
Lei Ma
26
0
0
07 Oct 2024
Can visual language models resolve textual ambiguity with visual cues?
  Let visual puns tell you!
Can visual language models resolve textual ambiguity with visual cues? Let visual puns tell you!
Jiwan Chung
Seungwon Lim
Jaehyun Jeon
Seungbeen Lee
Youngjae Yu
25
0
0
01 Oct 2024
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in
  Instructional Videos
Propose, Assess, Search: Harnessing LLMs for Goal-Oriented Planning in Instructional Videos
Md. Mohaiminul Islam
Tushar Nagarajan
Huiyu Wang
Fu-Jen Chu
Kris M. Kitani
Gedas Bertasius
Xitong Yang
38
2
0
30 Sep 2024
Episodic Memory Verbalization using Hierarchical Representations of
  Life-Long Robot Experience
Episodic Memory Verbalization using Hierarchical Representations of Life-Long Robot Experience
Leonard Barmann
Chad DeChant
Joana Plewnia
Fabian Peller-Konrad
Daniel Bauer
Tamim Asfour
Alex Waibel
LM&Ro
34
1
0
26 Sep 2024
Attention Prompting on Image for Large Vision-Language Models
Attention Prompting on Image for Large Vision-Language Models
Runpeng Yu
Weihao Yu
Xinchao Wang
VLM
40
6
0
25 Sep 2024
MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration
  with Large Language Models
MHRC: Closed-loop Decentralized Multi-Heterogeneous Robot Collaboration with Large Language Models
Wenhao Yu
Jie Peng
Yueliang Ying
Sai Li
Jianmin Ji
Yanyong Zhang
53
4
0
24 Sep 2024
SYNERGAI: Perception Alignment for Human-Robot Collaboration
SYNERGAI: Perception Alignment for Human-Robot Collaboration
Yixin Chen
Guoxi Zhang
Yaowei Zhang
Hongming Xu
Peiyuan Zhi
Qing Li
Siyuan Huang
37
0
0
24 Sep 2024
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
Discovering Object Attributes by Prompting Large Language Models with Perception-Action APIs
A. Mavrogiannis
Dehao Yuan
Yiannis Aloimonos
LM&Ro
43
0
0
23 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
39
1
0
19 Sep 2024
Multimodal Fusion with LLMs for Engagement Prediction in Natural
  Conversation
Multimodal Fusion with LLMs for Engagement Prediction in Natural Conversation
Cheng Charles Ma
Kevin Hyekang Joo
Alexandria K. Vail
Sunreeta Bhattacharya
Álvaro Fernández García
Kailana Baker-Matsuoka
Sheryl Mathew
Lori L. Holt
Fernando De la Torre
49
3
0
13 Sep 2024
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers
HiRT: Enhancing Robotic Control with Hierarchical Robot Transformers
Jianke Zhang
Yanjiang Guo
Xiaoyu Chen
Yen-Jen Wang
Yucheng Hu
Chengming Shi
Jianyu Chen
31
5
0
12 Sep 2024
Robot Utility Models: General Policies for Zero-Shot Deployment in New
  Environments
Robot Utility Models: General Policies for Zero-Shot Deployment in New Environments
Haritheja Etukuru
Norihito Naka
Zijin Hu
Seungjae Lee
Julian Mehu
Aaron Edsinger
Chris Paxton
Soumith Chintala
Lerrel Pinto
Nur Muhammad (Mahi) Shafiullah
LM&Ro
31
23
0
09 Sep 2024
Bridging the gap between natural user expression with complex automation
  programming in smart homes
Bridging the gap between natural user expression with complex automation programming in smart homes
Yingtian Shi
Xiaoyi Liu
Chun Yu
Tianao Yang
Cheng Gao
Chen Liang
Yuanchun Shi
24
0
0
22 Aug 2024
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal
  models
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models
Matteo Forlini
Mihail Babcinschi
Giacomo Palmieri
Pedro Neto
37
1
0
21 Aug 2024
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yalin Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
36
3
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
45
1
0
04 Aug 2024
Toward Automatic Relevance Judgment using Vision--Language Models for
  Image--Text Retrieval Evaluation
Toward Automatic Relevance Judgment using Vision--Language Models for Image--Text Retrieval Evaluation
Jheng-Hong Yang
Jimmy Lin
VLM
47
3
0
02 Aug 2024
CityX: Controllable Procedural Content Generation for Unbounded 3D
  Cities
CityX: Controllable Procedural Content Generation for Unbounded 3D Cities
Shougao Zhang
Mengqi Zhou
Yuxi Wang
Chuanchen Luo
Rongyu Wang
Yiwei Li
Xucheng Yin
Zhaoxiang Zhang
Junran Peng
43
7
0
24 Jul 2024
Can VLMs be used on videos for action recognition? LLMs are Visual
  Reasoning Coordinators
Can VLMs be used on videos for action recognition? LLMs are Visual Reasoning Coordinators
Harsh Lunia
40
0
0
20 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
53
7
0
16 Jul 2024
Affordance-Guided Reinforcement Learning via Visual Prompting
Affordance-Guided Reinforcement Learning via Visual Prompting
Olivia Y. Lee
Annie Xie
Kuan Fang
Karl Pertsch
Chelsea Finn
OffRL
LM&Ro
74
7
0
14 Jul 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
43
8
0
13 Jul 2024
Instruction Following with Goal-Conditioned Reinforcement Learning in
  Virtual Environments
Instruction Following with Goal-Conditioned Reinforcement Learning in Virtual Environments
Zoya Volovikova
A. Skrynnik
Petr Kuderov
Aleksandr I. Panov
LLMAG
LM&Ro
46
0
0
12 Jul 2024
Constructing Concept-based Models to Mitigate Spurious Correlations with
  Minimal Human Effort
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim
Ze Wang
Qiang Qiu
43
1
0
12 Jul 2024
Aligning Cyber Space with Physical World: A Comprehensive Survey on
  Embodied AI
Aligning Cyber Space with Physical World: A Comprehensive Survey on Embodied AI
Yang Liu
Weixing Chen
Yongjie Bai
Xiaodan Liang
Guanbin Li
Wen Gao
Liang Lin
LM&Ro
SyDa
AI4CE
51
50
0
09 Jul 2024
Visualizing Dialogues: Enhancing Image Selection through Dialogue
  Understanding with Large Language Models
Visualizing Dialogues: Enhancing Image Selection through Dialogue Understanding with Large Language Models
Chang-Sheng Kao
Yun-Nung Chen
23
0
0
04 Jul 2024
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language
  Models
Commonsense Reasoning for Legged Robot Adaptation with Vision-Language Models
Annie S. Chen
Alec M. Lessing
Andy Tang
Govind Chada
Laura Smith
Sergey Levine
Chelsea Finn
LM&Ro
LRM
39
9
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and
  Aleatoric Awareness
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Raghavi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
50
2
0
02 Jul 2024
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models:
  Enhancing Performance and Reducing Inference Costs
Efficient Expert Pruning for Sparse Mixture-of-Experts Language Models: Enhancing Performance and Reducing Inference Costs
Enshu Liu
Junyi Zhu
Zinan Lin
Xuefei Ning
Matthew B. Blaschko
Shengen Yan
Guohao Dai
Huazhong Yang
Yu Wang
MoE
62
5
0
01 Jul 2024
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
LLaRA: Supercharging Robot Learning Data for Vision-Language Policy
Xiang Li
Cristina Mata
J. Park
Kumara Kahatapitiya
Yoo Sung Jang
...
Kanchana Ranasinghe
R. Burgert
Mu Cai
Yong Jae Lee
Michael S. Ryoo
LM&Ro
72
25
0
28 Jun 2024
ROS-LLM: A ROS framework for embodied AI with task feedback and
  structured reasoning
ROS-LLM: A ROS framework for embodied AI with task feedback and structured reasoning
Christopher E. Mower
Yuhui Wan
Hongzhan Yu
Antoine Grosnit
Jonas Gonzalez-Billandon
...
Kun Shao
Xingyue Quan
Jianye Hao
Jun Wang
Haitham Bou-Ammar
LM&Ro
LLMAG
34
8
0
28 Jun 2024
Tools Fail: Detecting Silent Errors in Faulty Tools
Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun
So Yeon Min
Yingshan Chang
Yonatan Bisk
32
4
0
27 Jun 2024
Lifelong Robot Library Learning: Bootstrapping Composable and
  Generalizable Skills for Embodied Control with Language Models
Lifelong Robot Library Learning: Bootstrapping Composable and Generalizable Skills for Embodied Control with Language Models
Georgios Tziafas
H. Kasaei
KELM
LM&Ro
47
8
0
26 Jun 2024
Towards Open-World Grasping with Large Vision-Language Models
Towards Open-World Grasping with Large Vision-Language Models
Georgios Tziafas
H. Kasaei
LM&Ro
LRM
37
12
0
26 Jun 2024
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with
  3D Semantic Maps
Open-vocabulary Mobile Manipulation in Unseen Dynamic Environments with 3D Semantic Maps
Dicong Qiu
Wenzong Ma
Zhenfu Pan
Hui Xiong
Junwei Liang
LM&Ro
39
7
0
26 Jun 2024
Retrieval-Augmented Code Generation for Situated Action Generation: A
  Case Study on Minecraft
Retrieval-Augmented Code Generation for Situated Action Generation: A Case Study on Minecraft
Chalamalasetti Kranti
Sherzod Hakimov
David Schlangen
34
1
0
25 Jun 2024
Adversaries Can Misuse Combinations of Safe Models
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
Anca Dragan
Jacob Steinhardt
45
7
0
20 Jun 2024
Using Multimodal Large Language Models for Automated Detection of
  Traffic Safety Critical Events
Using Multimodal Large Language Models for Automated Detection of Traffic Safety Critical Events
M. Tami
Huthaifa I. Ashqar
Mohammed Elhenawy
42
3
0
19 Jun 2024
DrVideo: Document Retrieval Based Long Video Understanding
DrVideo: Document Retrieval Based Long Video Understanding
Ziyu Ma
Chenhui Gou
Hengcan Shi
Bin Sun
Shutao Li
Hamid Rezatofighi
Jianfei Cai
VLM
36
13
0
18 Jun 2024
Minimal Self in Humanoid Robot "Alter3" Driven by Large Language Model
Minimal Self in Humanoid Robot "Alter3" Driven by Large Language Model
Takahide Yoshida
Suzune Baba
A. Masumori
Takashi Ikegami
LM&Ro
40
1
0
17 Jun 2024
From Text to Life: On the Reciprocal Relationship between Artificial
  Life and Large Language Models
From Text to Life: On the Reciprocal Relationship between Artificial Life and Large Language Models
Eleni Nisioti
Claire Glanois
Elias Najarro
Andrew Dai
Elliot Meyerson
J. Pedersen
Laetitia Teodorescu
Conor F. Hayes
Shyam Sudhakaran
Sebastian Risi
AI4CE
LM&Ro
51
3
0
14 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal
  Language Models
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
53
38
0
13 Jun 2024
Real2Code: Reconstruct Articulated Objects via Code Generation
Real2Code: Reconstruct Articulated Objects via Code Generation
Zhao Mandi
Yijia Weng
Dominik Bauer
Shuran Song
45
17
0
12 Jun 2024
Grounding Multimodal Large Language Models in Actions
Grounding Multimodal Large Language Models in Actions
Andrew Szot
Bogdan Mazoure
Harsh Agrawal
Devon Hjelm
Z. Kira
Alexander Toshev
LM&Ro
35
10
0
12 Jun 2024
Language Guided Skill Discovery
Language Guided Skill Discovery
Seungeun Rho
Laura Smith
Tianyu Li
Sergey Levine
Xue Bin Peng
Sehoon Ha
LM&Ro
42
4
0
07 Jun 2024
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning
  of Large Language Models
POEM: Interactive Prompt Optimization for Enhancing Multimodal Reasoning of Large Language Models
Jianben He
Xingbo Wang
Shiyi Liu
Guande Wu
Claudio Silva
Huamin Qu
LRM
37
2
0
06 Jun 2024
Tool-Planner: Task Planning with Clusters across Multiple Tools
Tool-Planner: Task Planning with Clusters across Multiple Tools
Yanming Liu
Xinyue Peng
Jiannan Cao
Jiannan Cao
Xuhong Zhang
Sheng Cheng
Xun Wang
Xun Wang
Jianwei Yin
Tianyu Du
LLMAG
37
3
0
06 Jun 2024
A Survey of Language-Based Communication in Robotics
A Survey of Language-Based Communication in Robotics
William Hunt
Sarvapali D. Ramchurn
Mohammad D. Soorati
LM&Ro
65
12
0
06 Jun 2024
Previous
123456789
Next