ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.11559
  4. Cited By
Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

Computer Vision and Pattern Recognition (CVPR), 2022
18 November 2022
Tanmay Gupta
Aniruddha Kembhavi
    ReLMVLMLRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 381 papers shown
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
Siyi Chen
Mikaela Angelina Uy
Chan Hee Song
Faisal Ladhak
Adithyavairavan Murali
Qing Qu
Stan Birchfield
Valts Blukis
Jonathan Tremblay
OffRLLRM
143
0
0
03 Dec 2025
DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction
DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction
Xia Su
Cuong Nguyen
Matheus A. Gadelha
Jon E. Froehlich
68
0
0
01 Dec 2025
PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models
Zeqing Wang
Keze Wang
Lei Zhang
VGen
127
0
0
01 Dec 2025
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
Video-CoM: Interactive Video Reasoning via Chain of Manipulations
H. Rasheed
Mohammed Zumri
Muhammad Maaz
Ming-Hsuan Yang
Fahad Shahbaz Khan
Salman Khan
AI4TSLRM
164
0
0
28 Nov 2025
Prune4Web: DOM Tree Pruning Programming for Web Agent
Prune4Web: DOM Tree Pruning Programming for Web Agent
J. Zhang
Kaiquan Chen
Zhihao Lu
Enshen Zhou
Qian Yu
Jing Zhang
359
0
0
26 Nov 2025
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment
Ziheng Ouyang
Yiren Song
Y. Liu
Shihao Zhu
Qibin Hou
Ming-Ming Cheng
Mike Zheng Shou
128
0
0
25 Nov 2025
Synthesizing Visual Concepts as Vision-Language Programs
Synthesizing Visual Concepts as Vision-Language Programs
Antonia Wüst
Wolfgang Stammer
Hikaru Shindo
Lukas Helff
Devendra Singh Dhami
Kristian Kersting
LRM
99
0
0
24 Nov 2025
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models
Shuai Wang
D. Zhang
Tianyi Bai
Shitong Shao
Jiebo Luo
Jiaheng Wei
VLM
142
1
0
24 Nov 2025
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens
Yiming Qin
Bomin Wei
Jiaxin Ge
Konstantinos Kallidromitis
Stephanie Fu
Trevor Darrell
Xudong Wang
LRMVLM
246
1
0
24 Nov 2025
DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition
DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition
Raja Kumar
Arka Sadhu
Ram Nevatia
VLM
189
0
0
23 Nov 2025
Learning with Preserving for Continual Multitask Learning
Learning with Preserving for Continual Multitask Learning
H. Wang
Siwoo Bae
Zirong Chen
Meiyi Ma
CLL
191
0
0
11 Nov 2025
Tracking and Understanding Object Transformations
Tracking and Understanding Object Transformations
Yihong Sun
Xinyu Yang
Jennifer J. Sun
Bharath Hariharan
172
0
0
06 Nov 2025
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models
Tianfan Peng
Yuntao Du
Pengzhou Ji
Shijie Dong
Kailin Jiang
...
Jinhe Bi
Qian Li
Wei Du
Feng Xiao
Lizhen Cui
VLM
273
0
0
04 Nov 2025
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation
LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation
Gyeom Hwangbo
Hyungjoo Chae
Minseok Kang
Hyeonjong Ju
Soohyun Oh
Jinyoung Yeo
ELM
139
0
0
04 Nov 2025
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought
Yiyang Zhou
Haoqin Tu
Z. Wang
Zeyu Wang
Niklas Muennighoff
...
Shen Yan
Haoqi Fan
Cihang Xie
Huaxiu Yao
Qinghao Ye
LRM
256
2
0
04 Nov 2025
$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles
∣ ↻ BUS ∣\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|​↻BUS​​: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles
Trishanu Das
Abhilash Nandy
Khush Bajaj
D. Sen
LRM
142
0
0
03 Nov 2025
Test-time Scaling of LLMs: A Survey from A Subproblem Structure Perspective
Test-time Scaling of LLMs: A Survey from A Subproblem Structure Perspective
Zhuoyi Yang
Xu Guo
Tong Zhang
Huijuan Xu
Boyang Albert Li
LRM
153
0
0
01 Nov 2025
TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents
TEXT2DB: Integration-Aware Information Extraction with Large Language Model AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025
Yizhu Jiao
S. Li
Sizhe Zhou
Heng Ji
Jiawei Han
137
9
0
28 Oct 2025
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning
Shijian Wang
Jiarui Jin
Xingjian Wang
L. Song
Runhao Fu
H. Wang
Zongyuan Ge
Yuan Lu
Xuelian Cheng
ReLMLRM
132
5
0
27 Oct 2025
MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection
MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection
Anisha Saha
Varsha Suresh
Timothy Hospedales
Vera Demberg
LRM
81
0
0
27 Oct 2025
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments
PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments
Weijie Zhou
Xuantang Xiong
Yi Peng
Manli Tao
Chaoyang Zhao
Honghui Dong
Ming Tang
Jinqiao Wang
LRM
141
1
0
24 Oct 2025
See, Think, Act: Online Shopper Behavior Simulation with VLM Agents
See, Think, Act: Online Shopper Behavior Simulation with VLM Agents
Yimeng Zhang
Jiri Gesi
Ran Xue
Tian Wang
Ziyi Wang
...
Qingjun Cui
Yufan Guo
Jing Huang
Mubarak Shah
Dakuo Wang
OffRL
165
0
0
22 Oct 2025
Pursuing Minimal Sufficiency in Spatial Reasoning
Pursuing Minimal Sufficiency in Spatial Reasoning
Yejie Guo
Yunzhong Hou
Wufei Ma
Meng Tang
Ming-Hsuan Yang
LRM
98
0
0
19 Oct 2025
AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory
AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory
Jitesh Jain
Shubham Maheshwari
Ning Yu
Wen-mei W. Hwu
Humphrey Shi
RALM
145
1
0
17 Oct 2025
RECODE: Reasoning Through Code Generation for Visual Question Answering
RECODE: Reasoning Through Code Generation for Visual Question Answering
Junhong Shen
Mu Cai
Bo Hu
Ameet Talwalkar
David A. Ross
Cordelia Schmid
Alireza Fathi
ReLMLRM
173
0
0
15 Oct 2025
CapGeo: A Caption-Assisted Approach to Geometric Reasoning
CapGeo: A Caption-Assisted Approach to Geometric Reasoning
Y. Li
Siyi Qian
Hao Liang
Leqi Zheng
Ruichuan An
Yongzhen Guo
Wentao Zhang
ReLMLRM
116
0
0
10 Oct 2025
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning
Tajamul Ashraf
Umair Nawaz
Abdelrahman M. Shaker
Rao Muhammad Anwer
Philip Torr
Fahad Shahbaz Khan
Salman Khan
211
0
0
09 Oct 2025
RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models
Moon Ye-Bin
Roy Miles
Tae-Hyun Oh
Ismail Elezi
Jiankang Deng
OffRLVLM
128
0
0
09 Oct 2025
RoboPilot: Generalizable Dynamic Robotic Manipulation with Dual-thinking Modes
RoboPilot: Generalizable Dynamic Robotic Manipulation with Dual-thinking Modes
Xinyi Liu
M. Sani
Zewei Zhou
Julius Wirbel
Bahram Zarrin
Roberto Galeazzi
LRM
112
0
0
30 Sep 2025
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models
Chenyue Zhou
Mingxuan Wang
Yanbiao Ma
Chenxu Wu
Wanyi Chen
...
Guoli Jia
Lingling Li
Z. Lu
Y. Lu
Wenhan Luo
LRM
447
9
0
29 Sep 2025
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?
Yang Chen
Minghao Liu
Yufan Shen
Y. Li
Tianyuan Huang
...
Zhi Yu
Yongliang Shen
Yu Qiao
Yu Qiao
Ding Wang
VGenVLM
257
0
0
29 Sep 2025
Confidence-guided Refinement Reasoning for Zero-shot Question Answering
Confidence-guided Refinement Reasoning for Zero-shot Question Answering
Youwon Jang
Woo Suk Choi
Minjoon Jung
Minsu Lee
Byoung-Tak Zhang
ReLMLRM
98
0
0
25 Sep 2025
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA
Adaptive Fast-and-Slow Visual Program Reasoning for Long-Form VideoQA
Chenglin Li
Feng Han
FengTao
Ruilin Li
Qianglong Chen
Jingqi Tong
Yin Zhang
Jiaqi Wang
LRM
178
0
0
22 Sep 2025
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning
Hang Du
Jiayang Zhang
Guoshun Nan
Wendi Deng
Zhenyan Chen
...
Wang Xiao
Shan Huang
Yuqi Pan
Tao Qi
Sicong Leng
VLM
209
0
0
21 Sep 2025
Visual Programmability: A Guide for Code-as-Thought in Chart Understanding
Visual Programmability: A Guide for Code-as-Thought in Chart Understanding
Bohao Tang
Yan Ma
Fei Zhang
Jiadi Su
Ethan Chern
Zhulin Hu
Zhixin Wang
Pengfei Liu
Ya Zhang
LRM
133
0
0
11 Sep 2025
From Image Generation to Infrastructure Design: a Multi-agent Pipeline for Street Design Generation
From Image Generation to Infrastructure Design: a Multi-agent Pipeline for Street Design Generation
Chenguang Wang
Xiang Yan
Yilong Dai
Ziyi Wang
Susu Xu
AI4CE
132
2
0
05 Sep 2025
Reinforced Visual Perception with Tools
Reinforced Visual Perception with Tools
Zetong Zhou
Dongping Chen
Zixian Ma
Zhihan Hu
Mingyang Fu
Sinan Wang
Yao Wan
Zhou Zhao
Ranjay Krishna
OffRLVLMLRM
155
11
0
01 Sep 2025
Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models
Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models
Rui Zhang
Z. Wang
Tianli Yang
Hongwei Li
Wenbo Jiang
Qingchuan Zhao
Wenshu Fan
Guowen Xu
AAMLVLM
83
1
0
26 Aug 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
355
8
0
24 Aug 2025
Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent
Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent
Yixin Gao
Xin Li
Xiaohan Pan
Runsen Feng
Bingchen Li
Y. Qi
Y. Lu
Zhengxue Cheng
Zhibo Chen
Jörn Ostermann
134
0
0
21 Aug 2025
Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models
Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024
Xiao-Wen Yang
Jie-Jing Shao
Lan-Zhe Guo
Bo-Wen Zhang
Zhi Zhou
Lin-Han Jia
Wang-Zhou Dai
Yu-Feng Li
LRM
183
4
0
19 Aug 2025
Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies
Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies
Ayushman Sarkar
Mohd Yamani Idna Idris
Zhenyu Yu
LRM
160
12
0
14 Aug 2025
Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving
Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving
Tianyun Yang
Yunwen Li
Ziniu Li
Zhihang Lin
Ruoyu Sun
Tian Ding
ReLMLRM
122
1
0
12 Aug 2025
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision
Luozheng Qin
Jia Gong
Yuqing Sun
Tianjiao Li
Mengping Yang
Xiaomeng Yang
Chao Qu
Zhiyu Tan
Hao Li
MLLMLRM
220
0
0
07 Aug 2025
ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"
ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"
Zhongyi Zhou
Kohei Uehara
Haoyu Zhang
Jingtao Zhou
Lin Gu
Ruofei Du
Zheng Xu
Tatsuya Harada
AI4Ed
213
1
0
06 Aug 2025
SoilNet: A Multimodal Multitask Model for Hierarchical Classification of Soil Horizons
SoilNet: A Multimodal Multitask Model for Hierarchical Classification of Soil Horizons
Teodor Chiaburu
Vipin Singh
Frank Haußer
Felix Bießmann
89
1
0
05 Aug 2025
Zero-shot Compositional Action Recognition with Neural Logic Constraints
Zero-shot Compositional Action Recognition with Neural Logic Constraints
Gefan Ye
Lin Li
Kexin Li
Jun Xiao
Long Chen
200
3
0
04 Aug 2025
Multimodal Video Emotion Recognition with Reliable Reasoning Priors
Multimodal Video Emotion Recognition with Reliable Reasoning Priors
Zhepeng Wang
Yingjian Zhu
Guanghao Dong
Hongzhu Yi
F. Chen
Xinming Wang
Jun Xie
92
0
0
29 Jul 2025
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning
Xiaoyuan Li
Moxin Li
Wenjie Wang
Rui Men
Yichang Zhang
Fuli Feng
Dayiheng Liu
LRM
187
2
0
24 Jul 2025
Augmented Vision-Language Models: A Systematic Review
Augmented Vision-Language Models: A Systematic Review
Anthony C Davis
Burhan Sadiq
Tianmin Shu
Chien-Ming Huang
VLMLRM
196
0
0
24 Jul 2025
12345678
Next