ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

Computer Vision and Pattern Recognition (CVPR), 2022
18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLMVLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 375 papers shown
Title
ExoViP: Step-by-step Verification and Exploration with Exoskeleton
  Modules for Compositional Visual Reasoning
ExoViP: Step-by-step Verification and Exploration with Exoskeleton Modules for Compositional Visual Reasoning
Yanjie Wang
Alan Yuille
Zhuowan Li
Zilong Zheng
LRM
284
6
0
05 Aug 2024
Pyramid Coder: Hierarchical Code Generator for Compositional Visual
  Question Answering
Pyramid Coder: Hierarchical Code Generator for Compositional Visual Question Answering
Ruoyue Shen
Nakamasa Inoue
Koichi Shinoda
145
2
0
30 Jul 2024
Take A Step Back: Rethinking the Two Stages in Visual Reasoning
Take A Step Back: Rethinking the Two Stages in Visual ReasoningEuropean Conference on Computer Vision (ECCV), 2024
Mingyu Zhang
Jiting Cai
Mingyu Liu
Yue Xu
Cewu Lu
Yong-Lu Li
LRM
194
9
0
29 Jul 2024
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question
  Answering
AdaCoder: Adaptive Prompt Compression for Programmatic Visual Question AnsweringACM Multimedia (MM), 2024
Mahiro Ukai
Shuhei Kurita
Atsushi Hashimoto
Yoshitaka Ushiku
Nakamasa Inoue
164
3
0
28 Jul 2024
Multi-Modality Co-Learning for Efficient Skeleton-based Action
  Recognition
Multi-Modality Co-Learning for Efficient Skeleton-based Action Recognition
Jinfu Liu
Chong Chen
Mengyuan Liu
372
26
0
22 Jul 2024
MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept
  Discovery
MaxMI: A Maximal Mutual Information Criterion for Manipulation Concept Discovery
Pei Zhou
Yanchao Yang
226
2
0
21 Jul 2024
Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights
Designing Algorithms Empowered by Language Models: An Analytical Framework, Case Studies, and Insights
Yanxi Chen
Yaliang Li
Bolin Ding
Jingren Zhou
211
8
0
20 Jul 2024
Rethinking Video-Text Understanding: Retrieval from Counterfactually
  Augmented Data
Rethinking Video-Text Understanding: Retrieval from Counterfactually Augmented Data
Wufei Ma
Kai Li
Zhongshi Jiang
Moustafa Meshry
Qihao Liu
Huiyu Wang
Christian Hane
Yaoyao Liu
VGen
166
2
0
18 Jul 2024
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data
  via Visual Prompting
By My Eyes: Grounding Multimodal Large Language Models with Sensor Data via Visual Prompting
Hyungjun Yoon
Biniyam Aschalew Tolera
Taesik Gong
Kimin Lee
Sung-Ju Lee
166
17
0
15 Jul 2024
Constructing Concept-based Models to Mitigate Spurious Correlations with
  Minimal Human Effort
Constructing Concept-based Models to Mitigate Spurious Correlations with Minimal Human Effort
Jeeyung Kim
Ze Wang
Qiang Qiu
188
5
0
12 Jul 2024
The Synergy between Data and Multi-Modal Large Language Models: A Survey
  from Co-Development Perspective
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective
Zhen Qin
Daoyuan Chen
Wenhao Zhang
Liuyi Yao
Yilun Huang
Bolin Ding
Yaliang Li
Shuiguang Deng
283
11
0
11 Jul 2024
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with
  Inverse-Instruct
InverseCoder: Unleashing the Power of Instruction-Tuned Code LLMs with Inverse-Instruct
Yutong Wu
Di Huang
Wenxuan Shi
Wei Wang
Lingzhe Gao
...
Qi Guo
Yewen Pu
Dawei Yin
Xing Hu
Yunji Chen
SyDa
171
4
0
08 Jul 2024
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and
  Editing
GenArtist: Multimodal LLM as an Agent for Unified Image Generation and Editing
Zhenyu Wang
Aoxue Li
Zhenguo Li
Xihui Liu
MLLMDiffM
281
75
0
08 Jul 2024
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates
Zeyu Leo Liu
Shrey Pandit
Xi Ye
Eunsol Choi
Greg Durrett
KELMALM
321
12
0
08 Jul 2024
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition
  and Program of Thought Verification
Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification
Pritish Sahu
Karan Sikka
Ajay Divakaran
MLLMLRM
189
13
0
02 Jul 2024
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and
  Aleatoric Awareness
Certainly Uncertain: A Benchmark and Metric for Multimodal Epistemic and Aleatoric Awareness
Khyathi Chandu
Linjie Li
Anas Awadalla
Ximing Lu
Jae Sung Park
Jack Hessel
Lijuan Wang
Yejin Choi
289
6
0
02 Jul 2024
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via
  Data Synthesis
From the Least to the Most: Building a Plug-and-Play Visual Reasoner via Data Synthesis
Chuanqi Cheng
Jian Guan
Wei Wu
Rui Yan
LRM
162
15
0
28 Jun 2024
Tools Fail: Detecting Silent Errors in Faulty Tools
Tools Fail: Detecting Silent Errors in Faulty Tools
Jimin Sun
So Yeon Min
Yingshan Chang
Yonatan Bisk
259
14
0
27 Jun 2024
CogExplore: Contextual Exploration with Language-Encoded Environment
  Representations
CogExplore: Contextual Exploration with Language-Encoded Environment Representations
Harel Biggie
Patrick Cooper
Doncey Albin
Kristen Such
Christoffer Heckman
LM&Ro
150
0
0
24 Jun 2024
Evaluating Large Vision-and-Language Models on Children's Mathematical
  Olympiads
Evaluating Large Vision-and-Language Models on Children's Mathematical Olympiads
A. Cherian
Kuan-Chuan Peng
Suhas Lohit
Joanna Matthiesen
Kevin A. Smith
J. Tenenbaum
ELMLRM
143
14
0
22 Jun 2024
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Multimodal Task Vectors Enable Many-Shot Multimodal In-Context Learning
Brandon Huang
Chancharik Mitra
Assaf Arbelle
Leonid Karlinsky
Trevor Darrell
Roei Herzig
188
34
0
21 Jun 2024
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Whiteboard-of-Thought: Thinking Step-by-Step Across Modalities
Sachit Menon
Richard Zemel
Carl Vondrick
LRM
183
8
0
20 Jun 2024
VDebugger: Harnessing Execution Feedback for Debugging Visual Programs
VDebugger: Harnessing Execution Feedback for Debugging Visual ProgramsConference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Xueqing Wu
Zongyu Lin
Songyan Zhao
Te-Lin Wu
Pan Lu
Nanyun Peng
Kai-Wei Chang
LRM
238
3
0
19 Jun 2024
Automatic benchmarking of large multimodal models via iterative
  experiment programming
Automatic benchmarking of large multimodal models via iterative experiment programming
Alessandro Conti
Enrico Fini
Paolo Rota
Yiming Wang
Goran Frehse
Elisa Ricci
198
1
0
18 Jun 2024
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
CodeNav: Beyond tool-use to using real-world codebases with LLM agents
Tanmay Gupta
Luca Weihs
Aniruddha Kembhavi
LLMAGELM
155
4
0
18 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLMVLM
279
47
0
18 Jun 2024
Investigating Video Reasoning Capability of Large Language Models with
  Tropes in Movies
Investigating Video Reasoning Capability of Large Language Models with Tropes in Movies
Hung-Ting Su
Chun-Tong Chao
Ya-Ching Hsu
Xudong Lin
Yulei Niu
Hung-Yi Lee
Winston H. Hsu
LRM
183
1
0
16 Jun 2024
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
What is the Visual Cognition Gap between Humans and Multimodal LLMs?
Xu Cao
Yifan Shen
Bolin Lai
Wenqian Ye
Yunsheng Ma
...
Jintai Chen
Meihuan Huang
Jianguo Cao
Aidong Zhang
James M. Rehg
283
19
0
14 Jun 2024
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal
  Language Models
Visual Sketchpad: Sketching as a Visual Chain of Thought for Multimodal Language Models
Yushi Hu
Weijia Shi
Xingyu Fu
Dan Roth
Mari Ostendorf
Luke Zettlemoyer
Noah A. Smith
Ranjay Krishna
LRM
245
176
0
13 Jun 2024
Commonsense-T2I Challenge: Can Text-to-Image Generation Models
  Understand Commonsense?
Commonsense-T2I Challenge: Can Text-to-Image Generation Models Understand Commonsense?
Xingyu Fu
Muyu He
Yujie Lu
William Yang Wang
Dan Roth
EGVMLRM
156
35
0
11 Jun 2024
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive
  Vision-Language Alignment
CPLIP: Zero-Shot Learning for Histopathology with Comprehensive Vision-Language AlignmentComputer Vision and Pattern Recognition (CVPR), 2024
Sajid Javed
Arif Mahmood
I. I. Ganapathi
Fayaz Ali Dharejo
Naoufel Werghi
Mohammed Bennamoun
VLMLM&MA
189
32
0
07 Jun 2024
LogiCode: an LLM-Driven Framework for Logical Anomaly Detection
LogiCode: an LLM-Driven Framework for Logical Anomaly DetectionIEEE Transactions on Automation Science and Engineering (T-ASE), 2024
Yiheng Zhang
Yunkang Cao
Xiaohao Xu
Nong Sang
183
30
0
07 Jun 2024
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
BadAgent: Inserting and Activating Backdoor Attacks in LLM Agents
Yifei Wang
Dizhan Xue
Shengjie Zhang
Shengsheng Qian
AAMLLLMAG
194
70
0
05 Jun 2024
Towards Rationality in Language and Multimodal Agents: A Survey
Towards Rationality in Language and Multimodal Agents: A Survey
Bowen Jiang
Yangxinyu Xie
Xiaomeng Wang
Yuan Yuan
Camillo J Taylor
Tanwi Mallick
Weijie J. Su
Camillo J. Taylor
Tanwi Mallick
LLMAG
254
12
0
01 Jun 2024
ParSEL: Parameterized Shape Editing with Language
ParSEL: Parameterized Shape Editing with Language
Aditya Ganeshan
Ryan Y. Huang
Xianghao Xu
R. K. Jones
Daniel E. Ritchie
KELM
195
8
0
30 May 2024
VQA Training Sets are Self-play Environments for Generating Few-shot
  Pools
VQA Training Sets are Self-play Environments for Generating Few-shot Pools
Tautvydas Misiunas
Hassan Mansoor
Jasper Uijlings
Oriana Riva
Victor Carbune
LRMVLM
122
1
0
30 May 2024
Programmable Motion Generation for Open-Set Motion Control Tasks
Programmable Motion Generation for Open-Set Motion Control Tasks
Hanchao Liu
Xiaohang Zhan
Shaoli Huang
Tai-Jiang Mu
Ying Shan
188
13
0
29 May 2024
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date Knowledge
UDKAG: Augmenting Large Vision-Language Models with Up-to-Date KnowledgeNeural Information Processing Systems (NeurIPS), 2024
Chuanhao Li
Zhen Li
Chenchen Jing
Shuo Liu
Wenqi Shao
Yuwei Wu
Ping Luo
Yu Qiao
Kaipeng Zhang
ELM
179
0
0
23 May 2024
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
MeteoRA: Multiple-tasks Embedded LoRA for Large Language Models
Jingwei Xu
Junyu Lai
Yunpeng Huang
MoEMoMe
212
12
0
19 May 2024
Libra: Building Decoupled Vision System on Large Language Models
Libra: Building Decoupled Vision System on Large Language ModelsInternational Conference on Machine Learning (ICML), 2024
Yifan Xu
Xiaoshan Yang
Y. Song
Changsheng Xu
MLLMVLM
162
10
0
16 May 2024
Large Language Models Synergize with Automated Machine Learning
Large Language Models Synergize with Automated Machine Learning
Jinglue Xu
Jialong Li
Zhen Liu
Nagar Anthel Venkatesh Suryanarayanan
Guoyuan Zhou
Jia Guo
Hitoshi Iba
Kenji Tei
162
7
0
06 May 2024
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework
  Based on Pre-Trained Large Models
Mozart's Touch: A Lightweight Multi-modal Music Generation Framework Based on Pre-Trained Large Models
Tianze Xu
Jiajun Li
Xuesong Chen
Xinrui Yao
Shuchang Liu
132
8
0
05 May 2024
Transcrib3D: 3D Referring Expression Resolution through Large Language
  Models
Transcrib3D: 3D Referring Expression Resolution through Large Language Models
Jiading Fang
Xiangshan Tan
Shengjie Lin
Igor Vasiljevic
Vitor Campagnolo Guizilini
Hongyuan Mei
Rares Andrei Ambrus
Gregory Shakhnarovich
Matthew R. Walter
LM&Ro
155
7
0
30 Apr 2024
Position: Do Not Explain Vision Models Without Context
Position: Do Not Explain Vision Models Without Context
Paulina Tomaszewska
Przemysław Biecek
195
1
0
28 Apr 2024
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
MaPa: Text-driven Photorealistic Material Painting for 3D Shapes
Shangzhan Zhang
Sida Peng
Tao Xu
Yuanbo Yang
Tianrun Chen
Nan Xue
Yujun Shen
Hujun Bao
Ruizhen Hu
Xiaowei Zhou
DiffM
294
22
0
26 Apr 2024
Leveraging Large Language Models for Multimodal Search
Leveraging Large Language Models for Multimodal Search
Oriol Barbany
Michael Huang
Xinliang Zhu
Arnab Dhua
199
14
0
24 Apr 2024
Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Think-Program-reCtify: 3D Situated Reasoning with Large Language Models
Qingrong He
Kejun Lin
Shizhe Chen
Anwen Hu
Qin Jin
LRM
172
4
0
23 Apr 2024
A Multimodal Automated Interpretability Agent
A Multimodal Automated Interpretability Agent
Tamar Rott Shaham
Sarah Schwettmann
Franklin Wang
Achyuta Rajaram
Evan Hernandez
Jacob Andreas
Antonio Torralba
429
41
0
22 Apr 2024
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Fact :Teaching MLLMs with Faithful, Concise and Transferable Rationales
Minghe Gao
Shuang Chen
Liang Pang
Xingtai Lv
Jisheng Dang
Wenqiao Zhang
Juncheng Li
Siliang Tang
Yueting Zhuang
Tat-Seng Chua
LRM
135
10
0
17 Apr 2024
Enhancing Visual Question Answering through Question-Driven Image
  Captions as Prompts
Enhancing Visual Question Answering through Question-Driven Image Captions as Prompts
Övgü Özdemir
Erdem Akagündüz
240
18
0
12 Apr 2024
Previous
12345678
Next