v1v2 (latest)

Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models

Computer Vision and Pattern Recognition (CVPR), 2023

5 December 2023

Yushi Hu

Otilia Stretcu

Chun-Ta Lu

Krishnamurthy Viswanathan

ArXiv (abs)PDF HTML Github

Papers citing "Visual Program Distillation: Distilling Tools and Programmatic Reasoning into Vision-Language Models"

27 / 27 papers shown

Self-Improving VLM Judges Without Human Annotations

02 Dec 2025

LAST: LeArning to Think in Space and Time for Generalist Vision-Language Models

213

24 Nov 2025

DuoTeach: Dual Role Self-Teaching for Coarse-to-Fine Decision Coordination in Vision--Language Models

176

23 Nov 2025

Online In-Context Distillation for Low-Resource Vision Language Models

167

20 Oct 2025

Pursuing Minimal Sufficiency in Spatial Reasoning

145

19 Oct 2025

Video-STAR: Reinforcing Open-Vocabulary Action Recognition with Tools

...

204

09 Oct 2025

ExPO-HM: Learning to Explain-then-Detect for Hateful Meme Detection

200

08 Oct 2025

Look Less, Reason More: Rollout-Guided Adaptive Pixel-Space Reasoning

275

02 Oct 2025

From Perception to Cognition: A Survey of Vision-Language Interactive Reasoning in Multimodal Large Language Models

...

638

29 Sep 2025

Learning in an Echo Chamber: Online Learning with Replay Adversary

Daniil Dmitriev

Harald Eskelund Franck

Carolin Heinzler

Amartya Sanyal

123

29 Sep 2025

Reinforced Visual Perception with Tools

202

01 Sep 2025

Explain Before You Answer: A Survey on Compositional Visual Reasoning

...

419

24 Aug 2025

Uni-cot: Towards Unified Chain-of-Thought Reasoning Across Text and Vision

337

07 Aug 2025

Trade-offs in Image Generation: How Do Different Dimensions Interact?

256

29 Jul 2025

Augmented Vision-Language Models: A Systematic Review

228

24 Jul 2025

MathOPEval: A Fine-grained Evaluation Benchmark for Visual Operations of MLLMs in Mathematical Reasoning

310

24 Jul 2025

Multi-Step Visual Reasoning with Visual Tokens Scaling and Verification

230

08 Jun 2025

Pixel Reasoner: Incentivizing Pixel-Space Reasoning with Curiosity-Driven Reinforcement Learning

381

165

21 May 2025

MASSV: Multimodal Adaptation and Self-Data Distillation for Speculative Decoding of Vision-Language Models

501

15 May 2025

Visually Interpretable Subtask Reasoning for Visual Question Answering

286

12 May 2025

DWIM: Towards Tool-aware Visual Reasoning via Discrepancy-aware Workflow Generation & Instruct-Masking Tuning

627

25 Mar 2025

OWLViz: An Open-World Benchmark for Visual Question Answering

367

04 Mar 2025

HoVLE: Unleashing the Power of Monolithic Vision-Language Models with Holistic Vision-Language EmbeddingComputer Vision and Pattern Recognition (CVPR), 2024

...

598

20 Dec 2024

VL-RewardBench: A Challenging Benchmark for Vision-Language Generative Reward ModelsComputer Vision and Pattern Recognition (CVPR), 2024

...

657

26 Nov 2024

CATP-LLM: Empowering Large Language Models for Cost-Aware Tool Planning

1.3K

25 Nov 2024

Leopard: A Vision Language Model For Text-Rich Multi-Image Tasks

431

02 Oct 2024

Cantor: Inspiring Multimodal Chain-of-Thought of MLLM

...

Liujuan Cao

Rongrong Ji

MLLM LRM

403

24 Apr 2024