Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2211.11559
Cited By

Visual Programming: Compositional visual reasoning without training

Visual Programming: Compositional visual reasoning without training

Computer Vision and Pattern Recognition (CVPR), 2022

18 November 2022

Aniruddha Kembhavi

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Visual Programming: Compositional visual reasoning without training"

50 / 381 papers shown

SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

Mikaela Angelina Uy

Adithyavairavan Murali

Stan Birchfield

Jonathan Tremblay

156

0

0

03 Dec 2025

DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction

DepthScape: Authoring 2.5D Designs via Depth Estimation, Semantic Understanding, and Geometry Extraction

Matheus A. Gadelha

Jon E. Froehlich

76

0

0

01 Dec 2025

PhyDetEx: Detecting and Explaining the Physical Plausibility of T2V Models

148

0

0

01 Dec 2025

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

Video-CoM: Interactive Video Reasoning via Chain of Manipulations

Ming-Hsuan Yang

Fahad Shahbaz Khan

166

0

0

28 Nov 2025

Prune4Web: DOM Tree Pruning Programming for Web Agent

Prune4Web: DOM Tree Pruning Programming for Web Agent

363

1

0

26 Nov 2025

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

The Consistency Critic: Correcting Inconsistencies in Generated Images via Reference-Guided Attentive Alignment

Ming-Ming Cheng

Mike Zheng Shou

138

0

0

25 Nov 2025

Synthesizing Visual Concepts as Vision-Language Programs

Synthesizing Visual Concepts as Vision-Language Programs

Wolfgang Stammer

Devendra Singh Dhami

Kristian Kersting

101

0

0

24 Nov 2025

LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models

LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models

148

1

0

24 Nov 2025

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Chain-of-Visual-Thought: Teaching VLMs to See and Think Better with Continuous Visual Tokens

Konstantinos Kallidromitis

260

1

0

24 Nov 2025

DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition

DiVE-k: Differential Visual Reasoning for Fine-grained Image Recognition

205

0

0

23 Nov 2025

Learning with Preserving for Continual Multitask Learning

Learning with Preserving for Continual Multitask Learning

196

0

0

11 Nov 2025

Tracking and Understanding Object Transformations

Tracking and Understanding Object Transformations

Jennifer J. Sun

Bharath Hariharan

174

0

0

06 Nov 2025

Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models

Can Visual Input Be Compressed? A Visual Token Compression Benchmark for Large Multimodal Models

...

279

0

0

04 Nov 2025

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation

LEGO-Eval: Towards Fine-Grained Evaluation on Synthesizing 3D Embodied Environments with Tool Augmentation

148

0

0

04 Nov 2025

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought

Niklas Muennighoff

...

257

2

0

04 Nov 2025

$$\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|$: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles$

\left|\,\circlearrowright\,\boxed{\text{BUS}}\,\right|

: A Large and Diverse Multimodal Benchmark for evaluating the ability of Vision-Language Models to understand Rebus Puzzles

148

1

0

03 Nov 2025

Test-time Scaling of LLMs: A Survey from A Subproblem Structure Perspective

Test-time Scaling of LLMs: A Survey from A Subproblem Structure Perspective

Boyang Albert Li

165

0

0

01 Nov 2025

TEXT2DB: Integration-Aware Information Extraction with Large Language Model Agents

TEXT2DB: Integration-Aware Information Extraction with Large Language Model AgentsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

138

9

0

28 Oct 2025

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

Video-Thinker: Sparking "Thinking with Videos" via Reinforcement Learning

133

5

0

27 Oct 2025

MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection

MUStReason: A Benchmark for Diagnosing Pragmatic Reasoning in Video-LMs for Multimodal Sarcasm Detection

Timothy Hospedales

83

0

0

27 Oct 2025

PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments

PhysVLM-AVR: Active Visual Reasoning for Multimodal Large Language Models in Physical Environments

142

1

0

24 Oct 2025

See, Think, Act: Online Shopper Behavior Simulation with VLM Agents

See, Think, Act: Online Shopper Behavior Simulation with VLM Agents

...

165

0

0

22 Oct 2025

Pursuing Minimal Sufficiency in Spatial Reasoning

Pursuing Minimal Sufficiency in Spatial Reasoning

Ming-Hsuan Yang

100

0

0

19 Oct 2025

AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory

AUGUSTUS: An LLM-Driven Multimodal Agent System with Contextualized User Memory

Shubham Maheshwari

145

1

0

17 Oct 2025

RECODE: Reasoning Through Code Generation for Visual Question Answering

RECODE: Reasoning Through Code Generation for Visual Question Answering

Ameet Talwalkar

Cordelia Schmid

173

0

0

15 Oct 2025

CapGeo: A Caption-Assisted Approach to Geometric Reasoning

CapGeo: A Caption-Assisted Approach to Geometric Reasoning

116

1

0

10 Oct 2025

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

MATRIX: Multimodal Agent Tuning for Robust Tool-Use Reasoning

Abdelrahman M. Shaker

Rao Muhammad Anwer

Fahad Shahbaz Khan

227

0

0

09 Oct 2025

RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models

RetouchLLM: Training-free Code-based Image Retouching with Vision Language Models

140

0

0

09 Oct 2025

RoboPilot: Generalizable Dynamic Robotic Manipulation with Dual-thinking Modes

RoboPilot: Generalizable Dynamic Robotic Manipulation with Dual-thinking Modes

Roberto Galeazzi

117

0

0

30 Sep 2025

From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models

From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models

...

454

10

0

29 Sep 2025

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

IWR-Bench: Can LVLMs reconstruct interactive webpage from a user interaction video?

...

260

0

0

29 Sep 2025

Confidence-guided Refinement Reasoning for Zero-shot Question Answering

Confidence-guided Refinement Reasoning for Zero-shot Question Answering

Byoung-Tak Zhang

103

0

0

25 Sep 2025

VideoPro: Adaptive Program Reasoning for Long Video Understanding

VideoPro: Adaptive Program Reasoning for Long Video Understanding

...

Feng Tao

Jingqi Tong

Yin Zhang

Jiaqi Wang

181

0

0

22 Sep 2025

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning

From Easy to Hard: The MIR Benchmark for Progressive Interleaved Multi-Image Reasoning

...

213

0

0

21 Sep 2025

Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

Visual Programmability: A Guide for Code-as-Thought in Chart Understanding

133

0

0

11 Sep 2025

From Image Generation to Infrastructure Design: a Multi-agent Pipeline for Street Design Generation

From Image Generation to Infrastructure Design: a Multi-agent Pipeline for Street Design Generation

153

2

0

05 Sep 2025

Reinforced Visual Perception with Tools

Reinforced Visual Perception with Tools

155

12

0

01 Sep 2025

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

Hidden Tail: Adversarial Image Causing Stealthy Resource Consumption in Vision-Language Models

83

1

0

26 Aug 2025

Explain Before You Answer: A Survey on Compositional Visual Reasoning

Explain Before You Answer: A Survey on Compositional Visual Reasoning

...

Gholamreza Haffari

361

9

0

24 Aug 2025

Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent

Comp-X: On Defining an Interactive Learned Image Compression Paradigm With Expert-driven LLM Agent

Jörn Ostermann

138

0

0

21 Aug 2025

Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language Models

Neuro-Symbolic Artificial Intelligence: Towards Improving the Reasoning Abilities of Large Language ModelsInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

190

5

0

19 Aug 2025

Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies

Reasoning in Computer Vision: Taxonomy, Models, Tasks, and Methodologies

Ayushman Sarkar

Mohd Yamani Idna Idris

167

12

0

14 Aug 2025

Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving

Bridging Formal Language with Chain-of-Thought Reasoning to Geometry Problem Solving

127

1

0

12 Aug 2025

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

228

0

0

07 Aug 2025

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

220

2

0

06 Aug 2025

SoilNet: A Multimodal Multitask Model for Hierarchical Classification of Soil Horizons

SoilNet: A Multimodal Multitask Model for Hierarchical Classification of Soil Horizons

Teodor Chiaburu

Felix Bießmann

Joey Prüssing

Frank Haußer

Felix Bießmann

89

1

0

05 Aug 2025

Zero-shot Compositional Action Recognition with Neural Logic Constraints

Zero-shot Compositional Action Recognition with Neural Logic Constraints

206

3

0

04 Aug 2025

Multimodal Video Emotion Recognition with Reliable Reasoning Priors

Multimodal Video Emotion Recognition with Reliable Reasoning Priors

93

0

0

29 Jul 2025

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning

192

2

0

24 Jul 2025

Augmented Vision-Language Models: A Systematic Review

Augmented Vision-Language Models: A Systematic Review

Anthony C Davis

Chien-Ming Huang

196

0

0

24 Jul 2025

1 2 3 4 5 6 7 8

Page 1 of 8