Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2211.11559
Cited By
Visual Programming: Compositional visual reasoning without training
Computer Vision and Pattern Recognition (CVPR), 2022
18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Visual Programming: Compositional visual reasoning without training"
50 / 375 papers shown
Title
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yanjie Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
284
6
0
05 Aug 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
145
2
0
30 Jul 2024
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
European Conference on Computer Vision (ECCV), 2024
Mingyu Zhang
Jiting Cai
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
LRM
194
9
0
29 Jul 2024
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question Answering
ACM Multimedia (MM), 2024
Mahiro Ukai
Shuhei Kurita
Atsushi Hashimoto
Yoshitaka Ushiku
Nakamasa Inoue
164
3
0
28 Jul 2024
Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition
Jinfu Liu
Chong Chen
Mengyuan Liu
372
26
0
22 Jul 2024
MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
Pei Zhou
Yanchao Yang
226
2
0
21 Jul 2024
Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
211
8
0
20 Jul 2024
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma
Kai Li
Zhongshi Jiang
Moustafa Meshry
Qihao Liu
Huiyu Wang
Christian Hane
Yaoyao Liu
VGen
166
2
0
18 Jul 2024
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon
Biniyam Aschalew Tolera
Taesik Gong
Kimin Lee
Sung-Ju Lee
166
17
0
15 Jul 2024
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim
Ze Wang
Qiang Qiu
188
5
0
12 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
283
11
0
11 Jul 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
171
4
0
08 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLM
DiffM
281
75
0
08 Jul 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELM
ALM
321
12
0
08 Jul 2024
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
Pritish Sahu
Karan Sikka
Ajay Divakaran
MLLM
LRM
189
13
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
289
6
0
02 Jul 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
LRM
162
15
0
28 Jun 2024
Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun
So Yeon Min
Yingshan Chang
Yonatan Bisk
259
14
0
27 Jun 2024
CogExplore: Contextual Exploration with Language-Encoded Environment Representations
Harel Biggie
Patrick Cooper
Doncey Albin
Kristen Such
Christoffer Heckman
LM&Ro
150
0
0
24 Jun 2024
Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Joanna Matthiesen
Kevin A. Smith
J. Tenenbaum
ELM
LRM
143
14
0
22 Jun 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Brandon Huang
Chancharik Mitra
Assaf Arbelle
Leonid Karlinsky
Trevor Darrell
Roei Herzig
188
34
0
21 Jun 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Sachit Menon
Richard Zemel
Carl Vondrick
LRM
183
8
0
20 Jun 2024
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xueqing Wu
Zongyu Lin
Songyan Zhao
Te-Lin Wu
Pan Lu
Nanyun Peng
Kai-Wei Chang
LRM
238
3
0
19 Jun 2024
Automatic benchmarking of large multimodal models via iterative experiment programming
Alessandro Conti
Enrico Fini
Paolo Rota
Yiming Wang
Goran Frehse
Elisa Ricci
198
1
0
18 Jun 2024
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta
Luca Weihs
Aniruddha Kembhavi
LLMAG
ELM
155
4
0
18 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
279
47
0
18 Jun 2024
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Hung-Ting Su
Chun-Tong Chao
Ya-Ching Hsu
Xudong Lin
Yulei Niu
Hung-Yi Lee
Winston H. Hsu
LRM
183
1
0
16 Jun 2024
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
Xu Cao
Yifan Shen
Bolin Lai
Wenqian Ye
Yunsheng Ma
...
Jintai Chen
Meihuan Huang
Jianguo Cao
Aidong Zhang
James M. Rehg
283
19
0
14 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
245
176
0
13 Jun 2024
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu
Muyu He
Yujie Lu
William Yang Wang
Dan Roth
EGVM
LRM
156
35
0
11 Jun 2024
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Computer Vision and Pattern Recognition (CVPR), 2024
Sajid Javed
Arif Mahmood
I. I. Ganapathi
Fayaz Ali Dharejo
Naoufel Werghi
Mohammed Bennamoun
VLM
LM&MA
189
32
0
07 Jun 2024
LogiCode: an LLM-Driven Framework for Logical Anomaly Detection
IEEE Transactions on Automation Science and Engineering (T-ASE), 2024
Yiheng Zhang
Yunkang Cao
Xiaohao Xu
Nong Sang
183
30
0
07 Jun 2024
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
Yifei Wang
Dizhan Xue
Shengjie Zhang
Shengsheng Qian
AAML
LLMAG
194
70
0
05 Jun 2024
Towards Rationality in Language and Multimodal Agents: A Survey
Bowen Jiang
Yangxinyu Xie
Xiaomeng Wang
Yuan Yuan
Camillo J Taylor
Tanwi Mallick
Weijie J. Su
Camillo J. Taylor
Tanwi Mallick
LLMAG
254
12
0
01 Jun 2024
ParSEL: Parameterized Shape Editing with Language
Aditya Ganeshan
Ryan Y. Huang
Xianghao Xu
R. K. Jones
Daniel E. Ritchie
KELM
195
8
0
30 May 2024
VQA Training Sets are Self-play Environments for Generating Few-shot Pools
Tautvydas Misiunas
Hassan Mansoor
Jasper Uijlings
Oriana Riva
Victor Carbune
LRM
VLM
122
1
0
30 May 2024
Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu
Xiaohang Zhan
Shaoli Huang
Tai-Jiang Mu
Ying Shan
188
13
0
29 May 2024
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Neural Information Processing Systems (NeurIPS), 2024
Chuanhao Li
Zhen Li
Chenchen Jing
Shuo Liu
Wenqi Shao
Yuwei Wu
Ping Luo
Yu Qiao
Kaipeng Zhang
ELM
179
0
0
23 May 2024
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Jingwei Xu
Junyu Lai
Yunpeng Huang
MoE
MoMe
212
12
0
19 May 2024
Libra: Building Decoupled Vision System on Large Language Models
International Conference on Machine Learning (ICML), 2024
Yifan Xu
Xiaoshan Yang
Y. Song
Changsheng Xu
MLLM
VLM
162
10
0
16 May 2024
Large Language Models Synergize with Automated Machine Learning
Jinglue Xu
Jialong Li
Zhen Liu
Nagar Anthel Venkatesh Suryanarayanan
Guoyuan Zhou
Jia Guo
Hitoshi Iba
Kenji Tei
162
7
0
06 May 2024
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
Tianze Xu
Jiajun Li
Xuesong Chen
Xinrui Yao
Shuchang Liu
132
8
0
05 May 2024
Transcrib3D: 3D Referring Expression Resolution through Large Language Models
Jiading Fang
Xiangshan Tan
Shengjie Lin
Igor Vasiljevic
Vitor Campagnolo Guizilini
Hongyuan Mei
Rares Andrei Ambrus
Gregory Shakhnarovich
Matthew R. Walter
LM&Ro
155
7
0
30 Apr 2024
Position: Do Not Explain Vision Models Without Context
Paulina Tomaszewska
Przemysław Biecek
195
1
0
28 Apr 2024
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Shangzhan Zhang
Sida Peng
Tao Xu
Yuanbo Yang
Tianrun Chen
Nan Xue
Yujun Shen
Hujun Bao
Ruizhen Hu
Xiaowei Zhou
DiffM
294
22
0
26 Apr 2024
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
199
14
0
24 Apr 2024
Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Qingrong He
Kejun Lin
Shizhe Chen
Anwen Hu
Qin Jin
LRM
172
4
0
23 Apr 2024
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
429
41
0
22 Apr 2024
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Minghe Gao
Shuang Chen
Liang Pang
Xingtai Lv
Jisheng Dang
Wenqiao Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
Tat-Seng Chua
LRM
135
10
0
17 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
240
18
0
12 Apr 2024
Previous
1
2
3
4
5
6
7
8
Next