ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

Computer Vision and Pattern Recognition (CVPR), 2022
18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLMVLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 380 papers shown
Title
GraphiMind: LLM-centric Interface for Information Graphics Design
GraphiMind: LLM-centric Interface for Information Graphics Design
Qiruin Huang
Min Lu
J. Lanir
Dani Lischinski
Daniel Cohen-Or
Hui Huang
MLLM
172
12
0
24 Jan 2024
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving
  Programmatic Tasks
TroVE: Inducing Verifiable and Efficient Toolboxes for Solving Programmatic TasksInternational Conference on Machine Learning (ICML), 2024
Zhiruo Wang
Daniel Fried
Graham Neubig
211
41
0
23 Jan 2024
Zero Shot Open-ended Video Inference
Zero Shot Open-ended Video Inference
Ee Yeo Keat
Zhang Hao
Alexander Matyasko
Basura Fernando
VLM
120
0
0
23 Jan 2024
CCA: Collaborative Competitive Agents for Image Editing
CCA: Collaborative Competitive Agents for Image Editing
Tiankai Hang
Shuyang Gu
Dong Chen
Xin Geng
Baining Guo
334
8
0
23 Jan 2024
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and
  Generating with Multimodal LLMs
Mastering Text-to-Image Diffusion: Recaptioning, Planning, and Generating with Multimodal LLMsInternational Conference on Machine Learning (ICML), 2024
Ling Yang
Zhaochen Yu
Chenlin Meng
Minkai Xu
Stefano Ermon
Tengjiao Wang
CoGeDiffM
481
188
0
22 Jan 2024
Prompting Large Vision-Language Models for Compositional Reasoning
Prompting Large Vision-Language Models for Compositional Reasoning
Timothy Ossowski
Ming Jiang
Junjie Hu
CoGeVLMLRM
202
7
0
20 Jan 2024
PhotoScout: Synthesis-Powered Multi-Modal Image Search
PhotoScout: Synthesis-Powered Multi-Modal Image Search
Celeste Barnaby
Qiaochu Chen
Chenglong Wang
Işıl Dillig
187
6
0
19 Jan 2024
LangProp: A code optimization framework using Large Language Models
  applied to driving
LangProp: A code optimization framework using Large Language Models applied to driving
Shu Ishida
Gianluca Corrado
George Fedoseev
Hudson Yeo
Lloyd Russell
Jamie Shotton
João F. Henriques
Anthony Hu
236
18
0
18 Jan 2024
DiffusionGPT: LLM-Driven Text-to-Image Generation System
DiffusionGPT: LLM-Driven Text-to-Image Generation System
Jie Qin
Jie Wu
Weifeng Chen
Yuxi Ren
Huixian Li
Hefeng Wu
Xuefeng Xiao
Rui Wang
S. Wen
DiffM
172
49
0
18 Jan 2024
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and
  Visual Question Generation
Advancing Large Multi-modal Models with Explicit Chain-of-Reasoning and Visual Question Generation
Kohei Uehara
Nabarun Goswami
Hanqin Wang
Toshiaki Baba
Kohtaro Tanaka
...
Takagi Naoya
Ryo Umagami
Yingyi Wen
Tanachai Anakewat
Tatsuya Harada
LRM
224
3
0
18 Jan 2024
Image Translation as Diffusion Visual Programmers
Image Translation as Diffusion Visual Programmers
Cheng Han
James Liang
Qifan Wang
Majid Rabbani
S. Dianat
Raghuveer M. Rao
Ying Nian Wu
Dongfang Liu
204
19
0
18 Jan 2024
Vlogger: Make Your Dream A Vlog
Vlogger: Make Your Dream A VlogComputer Vision and Pattern Recognition (CVPR), 2024
Shaobin Zhuang
Kunchang Li
Xinyuan Chen
Yaohui Wang
Ziwei Liu
Yu Qiao
Yali Wang
VGenDiffM
135
62
0
17 Jan 2024
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)
DoraemonGPT: Toward Understanding Dynamic Scenes with Large Language Models (Exemplified as A Video Agent)International Conference on Machine Learning (ICML), 2024
Zongxin Yang
Guikun Chen
Xiaodi Li
Wenguan Wang
Yi Yang
LM&RoLLMAG
421
59
0
16 Jan 2024
Exploring the Reasoning Abilities of Multimodal Large Language Models
  (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Exploring the Reasoning Abilities of Multimodal Large Language Models (MLLMs): A Comprehensive Survey on Emerging Trends in Multimodal Reasoning
Yiqi Wang
Wentao Chen
Xiaotian Han
Xudong Lin
Haiteng Zhao
Yongfei Liu
Bohan Zhai
Jianbo Yuan
Quanzeng You
Hongxia Yang
LRM
265
144
0
10 Jan 2024
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results
  for Video Question Answering
STAIR: Spatial-Temporal Reasoning with Auditable Intermediate Results for Video Question AnsweringAAAI Conference on Artificial Intelligence (AAAI), 2024
Yueqian Wang
Yuxuan Wang
Kai Chen
Dongyan Zhao
192
2
0
08 Jan 2024
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as
  Programmers
Towards Truly Zero-shot Compositional Visual Reasoning with LLMs as Programmers
Aleksandar Stanić
Sergi Caelles
Michael Tschannen
LRMVLM
274
12
0
03 Jan 2024
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code
  Empowers Large Language Models to Serve as Intelligent Agents
If LLM Is the Wizard, Then Code Is the Wand: A Survey on How Code Empowers Large Language Models to Serve as Intelligent Agents
Ke Yang
Jiateng Liu
John Wu
Chaoqi Yang
Yi R. Fung
...
Xu Cao
Xingyao Wang
Yiquan Wang
Chenhui Xu
Chengxiang Zhai
LLMAGELM
420
111
0
01 Jan 2024
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
259
264
0
28 Dec 2023
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub
Enhancing Open-Domain Task-Solving Capability of LLMs via Autonomous Tool Integration from GitHub
Bohan Lyu
Xin Cong
Heyang Yu
Pan Yang
Yujia Qin
...
Zhong Zhang
Shi Yu
Y. Lin
Zhiyuan Liu
Maosong Sun
LLMAG
234
7
0
28 Dec 2023
A Survey on Open-Set Image Recognition
A Survey on Open-Set Image Recognition
Qiulei Dong
Qiulei Dong
BDLObjD
211
10
0
25 Dec 2023
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
363
309
0
21 Dec 2023
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Designing LLM Chains by Adapting Techniques from Crowdsourcing Workflows
Madeleine Grunde-McLaughlin
Michelle S. Lam
Ranjay Krishna
Daniel S. Weld
Jeffrey Heer
AI4CE
280
25
0
18 Dec 2023
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
CLOVA: A Closed-Loop Visual Assistant with Tool Usage and Update
Zhi Gao
Yuntao Du
Xintong Zhang
Xiaojian Ma
Wenjuan Han
Song-Chun Zhu
Qing Li
LLMAGVLM
323
45
0
18 Dec 2023
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMs
InstructPipe: Generating Visual Blocks Pipelines with Human Instructions and LLMsInternational Conference on Human Factors in Computing Systems (CHI), 2023
Zhongyi Zhou
Jing Jin
Vrushank Phadnis
Xiuxiu Yuan
Jun Jiang
...
A. Olwal
David Kim
Ram Iyengar
Na Li
Andrea Colaço
213
6
0
15 Dec 2023
Can LLM find the green circle? Investigation and Human-guided tool
  manipulation for compositional generalization
Can LLM find the green circle? Investigation and Human-guided tool manipulation for compositional generalizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023
Min Zhang
Jianfeng He
Shuo Lei
Murong Yue
Linhan Wang
Chang-Tien Lu
186
6
0
12 Dec 2023
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Vista-LLaMA: Reducing Hallucination in Video Language Models via Equal Distance to Visual Tokens
Fan Ma
Xiaojie Jin
Heng Wang
Yuchen Xian
Jiashi Feng
Yi Yang
204
70
0
12 Dec 2023
Visual Program Distillation: Distilling Tools and Programmatic Reasoning
  into Vision-Language Models
Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Yushi Hu
Otilia Stretcu
Chun-Ta Lu
Krishnamurthy Viswanathan
Kenji Hata
Enming Luo
Ranjay Krishna
Ariel Fuxman
VLMLRMMLLM
293
72
0
05 Dec 2023
Recursive Visual Programming
Recursive Visual ProgrammingEuropean Conference on Computer Vision (ECCV), 2023
Jiaxin Ge
Sanjay Subramanian
Baifeng Shi
Roei Herzig
Trevor Darrell
178
9
0
04 Dec 2023
Zero-Shot Video Question Answering with Procedural Programs
Zero-Shot Video Question Answering with Procedural Programs
Rohan Choudhury
Koichiro Niinuma
Kishore Venkateshan
László A. Jeni
159
37
0
01 Dec 2023
VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video
  Internet of Things
VIoTGPT: Learning to Schedule Vision Tools towards Intelligent Video Internet of Things
Yaoyao Zhong
Mengshi Qi
Rui Wang
Yuhan Qiu
Yang Zhang
Huadong Ma
236
2
0
01 Dec 2023
GELDA: A generative language annotation framework to reveal visual
  biases in datasets
GELDA: A generative language annotation framework to reveal visual biases in datasets
Krish Kabra
Kathleen M. Lewis
Guha Balakrishnan
VLM
148
1
0
29 Nov 2023
Leveraging VLM-Based Pipelines to Annotate 3D Objects
Leveraging VLM-Based Pipelines to Annotate 3D ObjectsInternational Conference on Machine Learning (ICML), 2023
Rishabh Kabra
Loic Matthey
Alexander Lerchner
Niloy J. Mitra
258
9
0
29 Nov 2023
Zero-shot Referring Expression Comprehension via Structural Similarity
  Between Images and Captions
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and CaptionsComputer Vision and Pattern Recognition (CVPR), 2023
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
386
19
0
28 Nov 2023
Compositional Chain-of-Thought Prompting for Large Multimodal Models
Compositional Chain-of-Thought Prompting for Large Multimodal ModelsComputer Vision and Pattern Recognition (CVPR), 2023
Chancharik Mitra
Brandon Huang
Trevor Darrell
Roei Herzig
MLLMLRM
308
162
0
27 Nov 2023
Visual Programming for Zero-shot Open-Vocabulary 3D Visual Grounding
Visual Programming for Zero-shot Open-Vocabulary 3D Visual GroundingComputer Vision and Pattern Recognition (CVPR), 2023
Zhihao Yuan
Jinke Ren
Chun-Mei Feng
Hengshuang Zhao
Shuguang Cui
Zhen Li
298
65
0
26 Nov 2023
Large Language Models as Automated Aligners for benchmarking
  Vision-Language Models
Large Language Models as Automated Aligners for benchmarking Vision-Language Models
Yuanfeng Ji
Chongjian Ge
Weikai Kong
Enze Xie
Zhengying Liu
Zhengguo Li
Ping Luo
MLLMELM
172
10
0
24 Nov 2023
Vamos: Versatile Action Models for Video Understanding
Vamos: Versatile Action Models for Video UnderstandingEuropean Conference on Computer Vision (ECCV), 2023
Shijie Wang
Qi Zhao
Minh Quan Do
Nakul Agarwal
Kwonjoon Lee
Chen Sun
334
35
0
22 Nov 2023
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided
  Code-Vision Representation
ViStruct: Visual Structural Knowledge Extraction via Curriculum Guided Code-Vision RepresentationConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Yangyi Chen
Xingyao Wang
Pengfei Yu
Derek Hoiem
Heng Ji
220
14
0
22 Nov 2023
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text
  Recognizer
Multi-modal In-Context Learning Makes an Ego-evolving Scene Text RecognizerComputer Vision and Pattern Recognition (CVPR), 2023
Zhen Zhao
Jingqun Tang
Chunhui Lin
Binghong Wu
Can Huang
Hao Liu
Xin Tan
Zhizhong Zhang
Yuan Xie
372
39
0
22 Nov 2023
De-fine: Decomposing and Refining Visual Programs with Auto-Feedback
De-fine: Decomposing and Refining Visual Programs with Auto-FeedbackACM Multimedia (ACM MM), 2023
Minghe Gao
Juncheng Li
Hao Fei
Liang Pang
Wei Ji
Guoming Wang
Wenqiao Zhang
Siliang Tang
Yueting Zhuang
154
12
0
21 Nov 2023
A Survey on Multimodal Large Language Models for Autonomous Driving
A Survey on Multimodal Large Language Models for Autonomous Driving
Can Cui
Yunsheng Ma
Xu Cao
Wenqian Ye
Yang Zhou
...
Xinrui Yan
Shuqi Mei
Jianguo Cao
Ziran Wang
Chao Zheng
308
423
0
21 Nov 2023
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language
  Model-based Agents in Real-world Systems
TPTU-v2: Boosting Task Planning and Tool Usage of Large Language Model-based Agents in Real-world Systems
Yilun Kong
Jingqing Ruan
Yihong Chen
Bin Zhang
Tianpeng Bao
...
Xiaoru Hu
Hangyu Mao
Ziyue Li
Xingyu Zeng
Rui Zhao
LLMAG
272
50
0
19 Nov 2023
SelfEval: Leveraging the discriminative nature of generative models for
  evaluation
SelfEval: Leveraging the discriminative nature of generative models for evaluation
Sai Saketh Rambhatla
Ishan Misra
EGVM
169
6
0
17 Nov 2023
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone
  GUI Navigation
GPT-4V in Wonderland: Large Multimodal Models for Zero-Shot Smartphone GUI Navigation
An Yan
Zhengyuan Yang
Wanrong Zhu
Kevin Qinghong Lin
Linjie Li
...
Yiwu Zhong
Julian McAuley
Jianfeng Gao
Zicheng Liu
Lijuan Wang
LLMAGLM&Ro
325
140
0
13 Nov 2023
Analyzing Modular Approaches for Visual Question Decomposition
Analyzing Modular Approaches for Visual Question DecompositionConference on Empirical Methods in Natural Language Processing (EMNLP), 2023
Apoorv Khandelwal
Ellie Pavlick
Chen Sun
238
5
0
10 Nov 2023
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
How to Bridge the Gap between Modalities: Survey on Multimodal Large Language Model
Shezheng Song
Xiaopeng Li
Shasha Li
Shan Zhao
Jie Yu
Jun Ma
Xiaoguang Mao
Weimin Zhang
229
0
0
10 Nov 2023
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities
  for Image Classification
Follow-Up Differential Descriptions: Language Models Resolve Ambiguities for Image ClassificationInternational Conference on Learning Representations (ICLR), 2023
Reza Esfandiarpoor
Stephen H. Bach
VLM
294
18
0
10 Nov 2023
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents
Shilong Liu
Hao Cheng
Haotian Liu
Hao Zhang
Feng Li
...
Hang Su
Jun Zhu
Lei Zhang
Jianfeng Gao
Chun-yue Li
MLLMVLM
231
189
0
09 Nov 2023
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and
  reusing ModulEs
GENOME: GenerativE Neuro-symbOlic visual reasoning by growing and reusing ModulEs
Zhenfang Chen
Rui Sun
Wenjun Liu
Yining Hong
Chuang Gan
LRM
282
22
0
08 Nov 2023
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
A Graph-to-Text Approach to Knowledge-Grounded Response Generation in Human-Robot Interaction
Nicholas Walker
Stefan Ultes
Pierre Lison
LM&Ro
472
1
0
03 Nov 2023
Previous
12345678
Next