v1v2v3 (latest)

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs

21 May 2025

ArXiv (abs)PDF HTML HuggingFace (1 upvotes)

Papers citing "Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs"

30 / 30 papers shown

Reinforcement Learning for Large Model: A Survey

322

24 Dec 2025

SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL

Adithyavairavan Murali

156

03 Dec 2025

Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights

176

01 Dec 2025

From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning

320

28 Nov 2025

Video Spatial Reasoning with Object-Centric 3D Rollout

133

17 Nov 2025

Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models

102

09 Nov 2025

DeepEyesV2: Toward Agentic Multimodal ModelIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025

132

07 Nov 2025

TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning

473

03 Nov 2025

ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model

...

216

28 Oct 2025

Visual Attention Reasoning via Hierarchical Search and Self-Verification

165

21 Oct 2025

Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning

17 Oct 2025

RECODE: Reasoning Through Code Generation for Visual Question Answering

173

15 Oct 2025

Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning

Ernesto Gabriel Hernández Montoya

...

326

14 Oct 2025

A Survey on Agentic Multimodal Large Language Models

...

LM&Ro AIFin AI4TS LRM AI4CE

250

13 Oct 2025

Latent Visual Reasoning

203

29 Sep 2025

LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning

167

29 Sep 2025

DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning

102

25 Sep 2025

GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents

...

166

19 Sep 2025

Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search

143

09 Sep 2025

Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding

148

04 Sep 2025

Explain Before You Answer: A Survey on Compositional Visual Reasoning

...

361

24 Aug 2025

edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer

152

18 Aug 2025

Simple o3: Towards Interleaved Vision-Language Reasoning

156

16 Aug 2025

Thyme: Think Beyond Images

...

225

15 Aug 2025

SIFThinker: Spatially-Aware Image Focus for Visual Reasoning

286

08 Aug 2025

PyVision: Agentic Vision with Dynamic Tooling

280

10 Jul 2025

OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning

510

20 Mar 2025

R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization

397

206

17 Mar 2025

R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model

404

127

07 Mar 2025

VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models

697

27 May 2024