Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.01795
Cited By
Multimodal Procedural Planning via Dual Text-Image Prompting
2 May 2023
Yujie Lu
Pan Lu
Zhiyu Zoey Chen
Wanrong Zhu
X. Wang
William Yang Wang
LM&Ro
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Multimodal Procedural Planning via Dual Text-Image Prompting"
15 / 15 papers shown
Title
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
43
0
0
12 Mar 2025
Paint Outside the Box: Synthesizing and Selecting Training Data for Visual Grounding
Zilin Du
Haoxin Li
Jianfei Yu
Boyang Li
77
0
0
01 Dec 2024
VLMPC: Vision-Language Model Predictive Control for Robotic Manipulation
Wentao Zhao
Jiaming Chen
Ziyu Meng
Donghui Mao
Ran Song
Wei Zhang
22
8
0
13 Jul 2024
RoboScript: Code Generation for Free-Form Manipulation Tasks across Real and Simulation
Junting Chen
Yao Mu
Qiaojun Yu
Tianming Wei
Silang Wu
...
Wenqi Shao
Yu Qiao
Huazhe Xu
Mingyu Ding
Ping Luo
LM&Ro
25
11
0
22 Feb 2024
Training on Synthetic Data Beats Real Data in Multimodal Relation Extraction
Zilin Du
Haoxin Li
Xu Guo
Boyang Li
12
1
0
05 Dec 2023
Tool Documentation Enables Zero-Shot Tool-Usage with Large Language Models
Cheng-Yu Hsieh
Sibei Chen
Chun-Liang Li
Yasuhisa Fujii
Alexander Ratner
Chen-Yu Lee
Ranjay Krishna
Tomas Pfister
LLMAG
SyDa
16
40
0
01 Aug 2023
Z-LaVI: Zero-Shot Language Solver Fueled by Visual Imagination
Yue Yang
Wenlin Yao
Hongming Zhang
Xiaoyang Wang
Dong Yu
Jianshu Chen
VLM
33
14
0
21 Oct 2022
Visualize Before You Write: Imagination-Guided Open-Ended Text Generation
Wanrong Zhu
An Yan
Yujie Lu
Wenda Xu
X. Wang
Miguel P. Eckstein
William Yang Wang
67
37
0
07 Oct 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
291
2,712
0
24 May 2022
Language Models with Image Descriptors are Strong Few-Shot Video-Language Learners
Zhenhailong Wang
Manling Li
Ruochen Xu
Luowei Zhou
Jie Lei
...
Chenguang Zhu
Derek Hoiem
Shih-Fu Chang
Mohit Bansal
Heng Ji
MLLM
VLM
153
134
0
22 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
Junnan Li
Dongxu Li
Caiming Xiong
S. Hoi
MLLM
BDL
VLM
CLIP
380
4,010
0
28 Jan 2022
An Empirical Study of GPT-3 for Few-Shot Knowledge-Based VQA
Zhengyuan Yang
Zhe Gan
Jianfeng Wang
Xiaowei Hu
Yumao Lu
Zicheng Liu
Lijuan Wang
161
401
0
10 Sep 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
253
4,735
0
24 Feb 2021
Unifying Vision-and-Language Tasks via Text Generation
Jaemin Cho
Jie Lei
Hao Tan
Mohit Bansal
MLLM
249
518
0
04 Feb 2021
1