Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2211.11559
Cited By
Visual Programming: Compositional visual reasoning without training
18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
ReLM
VLM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Visual Programming: Compositional visual reasoning without training"
50 / 309 papers shown
Title
Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Joanna Matthiesen
Kevin A. Smith
J. Tenenbaum
ELM
LRM
39
6
0
22 Jun 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Brandon Huang
Chancharik Mitra
Assaf Arbelle
Leonid Karlinsky
Trevor Darrell
Roei Herzig
37
12
0
21 Jun 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Sachit Menon
Richard Zemel
Carl Vondrick
LRM
28
1
0
20 Jun 2024
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
Xueqing Wu
Zongyu Lin
Songyan Zhao
Te-Lin Wu
Pan Lu
Nanyun Peng
Kai-Wei Chang
LRM
45
2
0
19 Jun 2024
Automatic benchmarking of large multimodal models via iterative experiment programming
Alessandro Conti
Enrico Fini
Paolo Rota
Yiming Wang
Massimiliano Mancini
Elisa Ricci
30
0
0
18 Jun 2024
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta
Luca Weihs
Aniruddha Kembhavi
LLMAG
ELM
56
1
0
18 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
27
22
0
18 Jun 2024
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Hung-Ting Su
Chun-Tong Chao
Ya-Ching Hsu
Xudong Lin
Yulei Niu
Hung-Yi Lee
Winston H. Hsu
LRM
31
1
0
16 Jun 2024
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
Xu Cao
Bolin Lai
Wenqian Ye
Yunsheng Ma
Joerg Heintz
Jintai Chen
Jianguo Cao
James M. Rehg
37
8
0
14 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
32
34
0
13 Jun 2024
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu
Muyu He
Yujie Lu
William Yang Wang
Dan Roth
EGVM
LRM
26
15
0
11 Jun 2024
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language Alignment
Sajid Javed
Arif Mahmood
I. I. Ganapathi
Fayaz Ali Dharejo
N. Werghi
Mohammed Bennamoun
VLM
LM&MA
30
10
0
07 Jun 2024
LogiCode: an LLM-Driven Framework for Logical Anomaly Detection
Yiheng Zhang
Yunkang Cao
Xiaohao Xu
Weiming Shen
29
14
0
07 Jun 2024
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
Yifei Wang
Dizhan Xue
Shengjie Zhang
Shengsheng Qian
AAML
LLMAG
32
19
0
05 Jun 2024
ParSEL: Parameterized Shape Editing with Language
Aditya Ganeshan
Ryan Y. Huang
Xianghao Xu
R. K. Jones
Daniel E. Ritchie
KELM
37
1
0
30 May 2024
VQA Training Sets are Self-play Environments for Generating Few-shot Pools
Tautvydas Misiunas
Hassan Mansoor
Jasper Uijlings
Oriana Riva
Victor Carbune
LRM
VLM
33
0
0
30 May 2024
Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu
Xiaohang Zhan
Shaoli Huang
Tai-Jiang Mu
Ying Shan
30
5
0
29 May 2024
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
Chuanhao Li
Zhen Li
Chenchen Jing
Shuo Liu
Wenqi Shao
Yuwei Wu
Ping Luo
Yu Qiao
Kaipeng Zhang
ELM
23
3
0
23 May 2024
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Jingwei Xu
Junyu Lai
Yunpeng Huang
MoE
MoMe
31
8
0
19 May 2024
Libra: Building Decoupled Vision System on Large Language Models
Yifan Xu
Xiaoshan Yang
Y. Song
Changsheng Xu
MLLM
VLM
31
6
0
16 May 2024
Large Language Models Synergize with Automated Machine Learning
Jinglue Xu
Jialong Li
Zhen Liu
Nagar Anthel Venkatesh Suryanarayanan
Guoyuan Zhou
Jia Guo
Hitoshi Iba
Kenji Tei
33
4
0
06 May 2024
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
Tianze Xu
Jiajun Li
Xuesong Chen
Xinrui Yao
Shuchang Liu
24
4
0
05 May 2024
Transcrib3D: 3D Referring Expression Resolution through Large Language Models
Jiading Fang
Xiangshan Tan
Shengjie Lin
Igor Vasiljevic
Vitor Campagnolo Guizilini
Hongyuan Mei
Rares Ambrus
Gregory Shakhnarovich
Matthew R. Walter
LM&Ro
33
4
0
30 Apr 2024
Position: Do Not Explain Vision Models Without Context
Paulina Tomaszewska
Przemysław Biecek
24
1
0
28 Apr 2024
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Shangzhan Zhang
Sida Peng
Tao Xu
Yuanbo Yang
Tianrun Chen
Nan Xue
Yujun Shen
Hujun Bao
Ruizhen Hu
Xiaowei Zhou
DiffM
19
9
0
26 Apr 2024
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
23
8
0
24 Apr 2024
Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Qingrong He
Kejun Lin
Shizhe Chen
Anwen Hu
Qin Jin
LRM
37
1
0
23 Apr 2024
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
18
17
0
22 Apr 2024
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Minghe Gao
Shuang Chen
Liang Pang
Yuan Yao
Jisheng Dang
Wenqiao Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
Tat-Seng Chua
LRM
32
5
0
17 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
31
9
0
12 Apr 2024
OpenBias: Open-set Bias Detection in Text-to-Image Generative Models
Moreno DÍncà
E. Peruzzo
Massimiliano Mancini
Dejia Xu
Vidit Goel
Xingqian Xu
Zhangyang Wang
Humphrey Shi
N. Sebe
53
31
0
11 Apr 2024
Learning to Localize Objects Improves Spatial Reasoning in Visual-LLMs
Kanchana Ranasinghe
Satya Narayan Shukla
Omid Poursaeed
Michael S. Ryoo
Tsung-Yu Lin
LRM
38
21
0
11 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
34
20
0
09 Apr 2024
Self-Training Large Language Models for Improved Visual Program Synthesis With Visual Reinforcement
Zaid Khan
B. Vijaykumar
S. Schulter
Yun Fu
Manmohan Chandraker
LRM
ReLM
20
6
0
06 Apr 2024
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Junhao Chen
Xiang Li
Xiaojun Ye
Chao Li
Zhaoxin Fan
Hao Zhao
VGen
3DV
197
4
0
05 Apr 2024
Visual Knowledge in the Big Model Era: Retrospect and Prospect
Wenguan Wang
Yi Yang
Yunhe Pan
VLM
31
16
0
05 Apr 2024
PREGO: online mistake detection in PRocedural EGOcentric videos
Alessandro Flaborea
Guido Maria DÁmely di Melendugno
Leonardo Plini
Luca Scofano
Edoardo De Matteis
Antonino Furnari
G. Farinella
Fabio Galasso
EgoV
48
11
0
02 Apr 2024
Evaluating Text-to-Visual Generation with Image-to-Text Generation
Zhiqiu Lin
Deepak Pathak
Baiqi Li
Jiayao Li
Xide Xia
Graham Neubig
Pengchuan Zhang
Deva Ramanan
EGVM
37
125
0
01 Apr 2024
Chat Modeling: Natural Language-based Procedural Modeling of Biological Structures without Training
Donggang Jia
Yunhai Wang
Ivan Viola
29
1
0
01 Apr 2024
LLMs are Good Sign Language Translators
Jia Gong
Lin Geng Foo
Yixuan He
Hossein Rahmani
Jun Liu
SLR
68
24
0
01 Apr 2024
Planning and Editing What You Retrieve for Enhanced Tool Learning
Tenghao Huang
Dongwon Jung
Muhao Chen
KELM
16
7
0
30 Mar 2024
Change-Agent: Towards Interactive Comprehensive Remote Sensing Change Interpretation and Analysis
Chenyang Liu
Keyan Chen
Haotian Zhang
Zipeng Qi
Zhengxia Zou
Z. Shi
33
27
0
28 Mar 2024
Residual-based Language Models are Free Boosters for Biomedical Imaging
Zhixin Lai
Jing Wu
Suiyao Chen
Yucheng Zhou
N. Hovakimyan
MedIm
25
26
0
26 Mar 2024
PropTest: Automatic Property Testing for Improved Visual Programming
Jaywon Koo
Ziyan Yang
Paola Cascante-Bonilla
Baishakhi Ray
Vicente Ordonez
LRM
24
2
0
25 Mar 2024
Synthesize Step-by-Step: Tools, Templates and LLMs as Data Generators for Reasoning-Based Chart VQA
Zhuowan Li
Bhavan A. Jasani
Peng Tang
Shabnam Ghadar
LRM
22
8
0
25 Mar 2024
VURF: A General-purpose Reasoning and Self-refinement Framework for Video Understanding
Ahmad A Mahmood
Ashmal Vayani
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
LRM
49
7
0
21 Mar 2024
HYDRA: A Hyper Agent for Dynamic Compositional Visual Reasoning
Fucai Ke
Zhixi Cai
Simindokht Jahangard
Weiqing Wang
P. D. Haghighi
Hamid Rezatofighi
LRM
38
8
0
19 Mar 2024
What Are Tools Anyway? A Survey from the Language Model Perspective
Zhiruo Wang
Zhoujun Cheng
Hao Zhu
Daniel Fried
Graham Neubig
60
26
0
18 Mar 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
46
55
0
18 Mar 2024
m&m's: A Benchmark to Evaluate Tool-Use for multi-step multi-modal Tasks
Zixian Ma
Weikai Huang
Jieyu Zhang
Tanmay Gupta
Ranjay Krishna
55
18
0
17 Mar 2024
Previous
1
2
3
4
5
6
7
Next