ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

Computer Vision and Pattern Recognition (CVPR), 2022
18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLMVLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 381 papers shown
Title
PyVision: Agentic Vision with Dynamic Tooling
PyVision: Agentic Vision with Dynamic Tooling
Shitian Zhao
H. Zhang
Shaoheng Lin
Ming Li
Qilong Wu
Kaipeng Zhang
Chen Wei
LRM
261
19
0
10 Jul 2025
GraspMAS: Zero-Shot Language-driven Grasp Detection with Multi-Agent System
GraspMAS: Zero-Shot Language-driven Grasp Detection with Multi-Agent System
Quang H. Nguyen
T. H. Le
Huy Le Nguyen
T. Vo
Tung D. Ta
Baoru Huang
Minh Nhat Vu
Anh-Tien Nguyen
215
0
0
23 Jun 2025
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
MUCAR: Benchmarking Multilingual Cross-Modal Ambiguity Resolution for Multimodal Large Language Models
Xiaolong Wang
Zhaolu Kang
Wangyuxuan Zhai
Xinyue Lou
Yunghwei Lai
...
Yawen Wang
Kaiyu Huang
Yile Wang
Peng Li
Wenshu Fan
174
0
0
20 Jun 2025
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
Reinforcing VLMs to Use Tools for Detailed Visual Reasoning Under Resource Constraints
Sunil Kumar
Bowen Zhao
Leo Parker Dirac
Paulina Varshavskaya
LRM
294
1
0
10 Jun 2025
A Neurosymbolic Agent System for Compositional Visual Reasoning
A Neurosymbolic Agent System for Compositional Visual Reasoning
Yichang Xu
Gaowen Liu
Ramana Rao Kompella
Sihao Hu
Tiansheng Huang
Fatih Ilhan
Selim Furkan Tekin
Zachary Yahn
LRMVLM
217
0
0
09 Jun 2025
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
HAIBU-ReMUD: Reasoning Multimodal Ultrasound Dataset and Model Bridging to General Specific Domains
Shijie Wang
Yilun Zhang
Zeyu Lai
Dexing Kong
213
0
0
09 Jun 2025
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification
Tianyi Bai
Zengjie Hu
Fupeng Sun
Jiantao Qiu
Yizhen Jiang
Guangxin He
Bohan Zeng
Conghui He
Binhang Yuan
Wentao Zhang
OffRLLRM
179
11
0
08 Jun 2025
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Vision-EKIPL: External Knowledge-Infused Policy Learning for Visual Reasoning
Chaoyang Wang
Zeyu Zhang
Meng Meng
Xu Zhou
Haiyun Jiang
OffRLLRM
210
1
0
07 Jun 2025
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Visual Graph Arena: Evaluating Visual Conceptualization of Vision and Multimodal Large Language Models
Z. Babaiee
Peyman M. Kiasari
Daniela Rus
Radu Grosu
150
1
0
06 Jun 2025
Gen-n-Val: Agentic Image Data Generation and Validation
Jing-En Huang
I-Sheng Fang
Tzuhsuan Huang
Chih-Yu Wang
Jun-Cheng Chen
VLM
306
0
0
05 Jun 2025
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
Follow the Flow: Fine-grained Flowchart Attribution with Neurosymbolic Agents
Manan Suri
Puneet Mathur
Nedim Lipka
Franck Dernoncourt
Ryan Rossi
Vivek Gupta
Dinesh Manocha
155
1
0
02 Jun 2025
PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation
PromptVFX: Text-Driven Fields for Open-World 3D Gaussian Animation
Mert Kiray
Paul Uhlenbruck
Nassir Navab
Benjamin Busam
VGen3DGSAI4CE
190
2
0
01 Jun 2025
Thinking with Generated Images
Thinking with Generated Images
Ethan Chern
Zhulin Hu
Steffi Chern
Siqi Kou
Jiadi Su
Yan Ma
Zhijie Deng
Pengfei Liu
LRM
237
29
0
28 May 2025
Efficiently Enhancing General Agents With Hierarchical-categorical Memory
Efficiently Enhancing General Agents With Hierarchical-categorical Memory
Changze Qiao
Mingming Lu
LLMAG
206
0
0
28 May 2025
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Understand, Think, and Answer: Advancing Visual Reasoning with Large Multimodal Models
Yufei Zhan
Hongyin Zhao
Yousong Zhu
Shurong Zheng
Fan Yang
Ming Tang
Jinqiao Wang
VLMLRM
255
1
0
27 May 2025
RefAV: Towards Planning-Centric Scenario Mining
RefAV: Towards Planning-Centric Scenario Mining
Cainan Davidson
Deva Ramanan
Neehar Peri
363
6
0
27 May 2025
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
Zeyi Huang
Zeyi Huang
Anirudh Sundara Rajan
Zefan Cai
Wen Xiao
Junjie Hu
Junjie Hu
Yong Jae Lee
223
15
0
26 May 2025
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
VTool-R1: VLMs Learn to Think with Images via Reinforcement Learning on Multimodal Tool Use
Mingyuan Wu
Jingcheng Yang
Jize Jiang
Meitang Li
Kaizhuo Yan
Hanchao Yu
Minjia Zhang
Chengxiang Zhai
Klara Nahrstedt
LRM
461
21
0
25 May 2025
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
v1: Learning to Point Visual Tokens for Multimodal Grounded Reasoning
Jiwan Chung
Junhyeok Kim
Siyeol Kim
Jaeyoung Lee
Min Soo Kim
Youngjae Yu
LRM
297
3
0
24 May 2025
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
Xintong Zhang
Zhi Gao
Bofei Zhang
Pengxiang Li
Xiaowen Zhang
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LRM
330
39
0
21 May 2025
Understanding Complexity in VideoQA via Visual Program Generation
Understanding Complexity in VideoQA via Visual Program Generation
Cristobal Eyzaguirre
Igor Vasiljevic
Achal Dave
Jiajun Wu
Rares Andrei Ambrus
Thomas Kollar
Juan Carlos Niebles
P. Tokmakov
238
0
0
19 May 2025
Neuro-Symbolic Query Compiler
Neuro-Symbolic Query CompilerAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yuyao Zhang
Zhicheng Dou
Xiaoxi Li
Jiajie Jin
Yongkang Wu
Zhonghua Li
Qi Ye
Ji-Rong Wen
NAI
283
1
0
17 May 2025
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
GuardReasoner-VL: Safeguarding VLMs via Reinforced Reasoning
Wenshu Fan
Shengfang Zhai
Mingzhe Du
Yulin Chen
Tri Cao
...
Xuzhao Li
Kun Wang
Junfeng Fang
Jiaheng Zhang
Bryan Hooi
OffRLLRM
247
17
0
16 May 2025
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
TUMS: Enhancing Tool-use Abilities of LLMs with Multi-structure Handlers
Aiyao He
Sijia Cui
Shuai Xu
Yanna Wang
Bo Xu
305
0
0
13 May 2025
Visually Interpretable Subtask Reasoning for Visual Question Answering
Visually Interpretable Subtask Reasoning for Visual Question Answering
Yu Cheng
A. Goel
Hakan Bilen
LRM
231
2
0
12 May 2025
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li
Zhi Gao
Bofei Zhang
Yapeng Mi
Xiaojian Ma
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LLMAG
559
0
0
30 Apr 2025
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Perspective-Aware Reasoning in Vision-Language Models via Mental Imagery Simulation
Phillip Y. Lee
Jihyeon Je
Chanho Park
Mikaela Angelina Uy
Leonidas Guibas
Minhyuk Sung
LRM
299
14
0
24 Apr 2025
Symbolic Representation for Any-to-Any Generative Tasks
Symbolic Representation for Any-to-Any Generative TasksComputer Vision and Pattern Recognition (CVPR), 2025
Jianfei Chen
Xiaoye Zhu
Yanjie Wang
Tianyang Liu
Xinhui Chen
...
Yifei Ke
Qingbin Liu
Yiwen Yuan
Julian McAuley
Li Li
DiffM
218
0
0
24 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-Xiong Wang
VLM
249
6
0
22 Apr 2025
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual Understanding
DyFo: A Training-Free Dynamic Focus Visual Search for Enhancing LMMs in Fine-Grained Visual UnderstandingComputer Vision and Pattern Recognition (CVPR), 2025
Geng Li
Jinglin Xu
Yunzhen Zhao
Yuxin Peng
ObjD
243
26
0
21 Apr 2025
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Manipulating Multimodal Agents via Cross-Modal Prompt Injection
Le Wang
Zonghao Ying
Tianyuan Zhang
Yaning Tan
Shengshan Hu
Mingchuan Zhang
A. Liu
Xianglong Liu
AAML
747
19
0
19 Apr 2025
Exploring Multimodal Prompt for Visualization Authoring with Large Language Models
Exploring Multimodal Prompt for Visualization Authoring with Large Language Models
Zhen Wen
Luoxuan Weng
Yinghao Tang
Runjin Zhang
Yunxing Liu
Bo Pan
Minfeng Zhu
Wei Chen
LRM
300
5
0
18 Apr 2025
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Reimagining Urban Science: Scaling Causal Inference with Large Language Models
Yutong Xia
Ao Qu
Yunhan Zheng
Yihong Tang
Dingyi Zhuang
...
Cathy Wu
Roger Zimmermann
Lijun Sun
Roger Zimmermann
Jinhua Zhao
AI4CE
930
2
0
15 Apr 2025
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Socratic Chart: Cooperating Multiple Agents for Robust SVG Chart Understanding
Zeyi Huang
Haohan Wang
LRM
141
2
0
14 Apr 2025
Resource-efficient Inference with Foundation Model Programs
Resource-efficient Inference with Foundation Model Programs
Lunyiu Nie
Zhimin Ding
Kevin Yu
Marco Cheung
C. Jermaine
S. Chaudhuri
245
1
0
09 Apr 2025
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Benchmarking Multimodal CoT Reward Model Stepwise by Visual Program
Minghe Gao
Xuqi Liu
Zhongqi Yue
Y. Wu
Shuang Chen
Juncheng Billy Li
Siliang Tang
Leilei Gan
Tat-Seng Chua
Yueting Zhuang
OffRLLRM
248
9
0
09 Apr 2025
Human-like compositional learning of visually-grounded concepts using synthetic environments
Human-like compositional learning of visually-grounded concepts using synthetic environments
Zijun Lin
M Ganesh Kumar
Cheston Tan
OCLCoGe
372
0
0
09 Apr 2025
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
DeepDubber-V1: Towards High Quality and Dialogue, Narration, Monologue Adaptive Movie Dubbing Via Multi-Modal Chain-of-Thoughts Reasoning Guidance
Junjie Zheng
Zihao Chen
Chaofan Ding
Xinhan Di
VGen
283
2
0
31 Mar 2025
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
When 'YES' Meets 'BUT': Can Large Models Comprehend Contradictory Humor Through Comparative Reasoning?
Tuo Liang
Zhe Hu
Jing Li
Hao Zhang
Yiren Lu
...
Yiran Qiao
Disheng Liu
Jeirui Peng
Jing Ma
Yu Yin
321
1
0
29 Mar 2025
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Reason-RFT: Reinforcement Fine-Tuning for Visual Reasoning of Vision Language Models
Huajie Tan
Yuheng Ji
Xiaoshuai Hao
Minglan Lin
Pengwei Wang
Zhongyuan Wang
Shanghang Zhang
ReLMOffRLLRM
517
0
0
26 Mar 2025
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning
Fucai Ke
Vijay Kumar B G
Xingjian Leng
Zhixi Cai
Zaid Khan
Weiqing Wang
P. D. Haghighi
H. Rezatofighi
Manmohan Chandraker
487
5
0
25 Mar 2025
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Exploring Hallucination of Large Multimodal Models in Video Understanding: Benchmark, Analysis and Mitigation
Hongcheng Gao
Jiashu Qu
Jingyi Tang
Baolong Bi
Yi Liu
Hongyu Chen
Li Liang
Li Su
Qingming Huang
MLLMVLMLRM
411
13
0
25 Mar 2025
A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives
A Study on Neuro-Symbolic Artificial Intelligence: Healthcare Perspectives
Delower Hossain
Jake Y Chen
NAI
473
9
0
23 Mar 2025
ChatStitch: Visualizing Through Structures via Surround-View Unsupervised Deep Image Stitching with Collaborative LLM-Agents
ChatStitch: Visualizing Through Structures via Surround-View Unsupervised Deep Image Stitching with Collaborative LLM-Agents
Hao Liang
Zhipeng Dong
Kaixin Chen
M. Fu
Yufeng Yue
Yi Yang
Mengyin Fu
277
0
0
19 Mar 2025
Benchmarking Failures in Tool-Augmented Language Models
Benchmarking Failures in Tool-Augmented Language ModelsNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025
Eduardo Treviño
Hugo Contant
James Ngai
Graham Neubig
Zora Z. Wang
194
4
0
18 Mar 2025
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool Play
PLAY2PROMPT: Zero-shot Tool Instruction Optimization for LLM Agents via Tool PlayAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Wei Fang
Yang Zhang
Kaizhi Qian
James R. Glass
Yada Zhu
LLMAG
243
0
0
18 Mar 2025
CoSTA∗\ast∗: Cost-Sensitive Toolpath Agent for Multi-turn Image Editing
Advait Gupta
NandaKiran Velaga
Dang Nguyen
Wanrong Zhu
DiffM
258
3
0
13 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflectionInternational Conference on Learning Representations (ICLR), 2025
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
344
4
0
12 Mar 2025
Alignment for Efficient Tool Calling of Large Language Models
Hongshen Xu
Zihan Wang
Zichen Zhu
Lei Pan
Xingyu Chen
Xiaoou Liu
Kai Yu
247
6
0
09 Mar 2025
ArcPro: Architectural Programs for Structured 3D Abstraction of Sparse PointsComputer Vision and Pattern Recognition (CVPR), 2025
Qingming Huang
Runze Zhang
Kangjun Liu
Minglun Gong
Hao Zhang
Hui Huang
3DPCAI4CE
230
3
0
04 Mar 2025
Previous
12345678
Next