ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.15436
  4. Cited By
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
v1v2v3 (latest)

Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs

21 May 2025
Xintong Zhang
Zhi Gao
Bofei Zhang
Pengxiang Li
Xiaowen Zhang
Zehua Wang
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
    LRM
ArXiv (abs)PDFHTMLHuggingFace (1 upvotes)

Papers citing "Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs"

30 / 30 papers shown
Title
Reinforcement Learning for Large Model: A Survey
Reinforcement Learning for Large Model: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
296
2
0
24 Dec 2025
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL
Siyi Chen
Mikaela Angelina Uy
Chan Hee Song
Faisal Ladhak
Adithyavairavan Murali
Qing Qu
Stan Birchfield
Valts Blukis
Jonathan Tremblay
OffRLLRM
110
0
0
03 Dec 2025
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Juanxi Tian
Siyuan Li
Conghui He
Lijun Wu
Cheng Tan
EGVMVGen
156
0
0
01 Dec 2025
From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning
From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning
C. Wang
Haozhe Wang
Xi Chen
J. Liu
Taofeng Xue
Chong Peng
Donglian Qi
Fangzhen Lin
Yunfeng Yan
OffRLLRM
296
0
0
28 Nov 2025
Video Spatial Reasoning with Object-Centric 3D Rollout
Video Spatial Reasoning with Object-Centric 3D Rollout
Haoran Tang
Meng Cao
Ruyang Liu
Xiaoxi Liang
Linglong Li
Ge Li
Xiaodan Liang
LRM
123
0
0
17 Nov 2025
Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models
Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models
Yule Chen
Yufan Ren
Sabine Süsstrunk
VLM
84
0
0
09 Nov 2025
DeepEyesV2: Toward Agentic Multimodal Model
DeepEyesV2: Toward Agentic Multimodal ModelIEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Jack Hong
Chenxiao Zhao
ChengLin Zhu
Weiheng Lu
Guohai Xu
Xing Yu
122
5
0
07 Nov 2025
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Ming Li
Jike Zhong
Shitian Zhao
H. Zhang
Shaoheng Lin
Yuxiang Lai
Chen Wei
Konstantinos Psounis
Kaipeng Zhang
EGVMLRMVLM
436
3
0
03 Nov 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
208
3
0
28 Oct 2025
VAR: Visual Attention Reasoning via Structured Search and Backtracking
VAR: Visual Attention Reasoning via Structured Search and Backtracking
Wei Cai
Jian Zhao
Yuchen Yuan
T. Zhang
Ming Zhu
Haichuan Tang
Chi Zhang
Xuelong Li
OffRLLRM
120
0
0
21 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
80
0
0
17 Oct 2025
RECODE: Reasoning Through Code Generation for Visual Question Answering
RECODE: Reasoning Through Code Generation for Visual Question Answering
Junhong Shen
Mu Cai
Bo Hu
Ameet Talwalkar
David A. Ross
Cordelia Schmid
Alireza Fathi
ReLMLRM
160
0
0
15 Oct 2025
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Xingang Guo
Utkarsh Tyagi
Advait Gosai
Paula Vergara
Ernesto Gabriel Hernández Montoya
...
Bin Hu
Yunzhong He
Bing Liu
Bing Liu
Rakshith S Srinivasa
VLMLRM
313
2
0
14 Oct 2025
A Survey on Agentic Multimodal Large Language Models
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&RoAIFinAI4TSLRMAI4CE
245
4
0
13 Oct 2025
Latent Visual Reasoning
Latent Visual Reasoning
Bangzheng Li
Ximeng Sun
Jiang-Long Liu
Ze Wang
Jialian Wu
Xiaodong Yu
Hao Chen
Emad Barsoum
Muhao Chen
Zicheng Liu
LRMVLM
192
5
0
29 Sep 2025
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
Shenghao Fu
Q. Yang
Yuan-Ming Li
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
LRM
156
6
0
29 Sep 2025
DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning
DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning
Tianrun Xu
Haoda Jing
Y. Li
Yuquan Wei
Jun Feng
Guanyu Chen
Haichuan Gao
Tianren Zhang
Feng Chen
OffRL
95
0
0
25 Sep 2025
GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents
GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents
Xianhang Ye
Yiqing Li
Wei Dai
Miancan Liu
Ziyuan Chen
...
Hongbo Min
Jinkui Ren
Xiantao Zhang
Wen Yang
Zhi Jin
152
3
0
19 Sep 2025
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
Xin Lai
Junyi Li
Wei Li
Tao Liu
Tianjian Li
Hengshuang Zhao
LRMVLM
121
26
0
09 Sep 2025
Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding
Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding
Wanfu Wang
Qipeng Huang
Guangquan Xue
Xiaobo Liang
Juntao Li
VLM
124
1
0
04 Sep 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLMCoGeLRM
344
8
0
24 Aug 2025
edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer
edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer
Chen Qian
Xinran Yu
Zewen Huang
Danyang Li
Qiang Ma
Fan Dang
X. Ding
Guangyong Shang
Zheng Yang
VLM
136
0
0
18 Aug 2025
Simple o3: Towards Interleaved Vision-Language Reasoning
Simple o3: Towards Interleaved Vision-Language Reasoning
Ye Wang
Qianglong Chen
Zejun Li
Siyuan Wang
Shijie Guo
Zhirui Zhang
Zhongyu Wei
MLLMLRMVLM
144
12
0
16 Aug 2025
Thyme: Think Beyond Images
Thyme: Think Beyond Images
Yi Zhang
Xingyu Lu
S. Yin
Chaoyou Fu
Wei Chen
...
Zhang Zhang
Liang Wang
Fan Yang
Tingting Gao
Guorui Zhou
LRMVLM
192
34
0
15 Aug 2025
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen
Ruihui Zhao
Chuwei Luo
Mingze Sun
Xinlei Yu
Yangyang Kang
Ruqi Huang
LRM
227
4
0
08 Aug 2025
PyVision: Agentic Vision with Dynamic Tooling
PyVision: Agentic Vision with Dynamic Tooling
Shitian Zhao
H. Zhang
Shaoheng Lin
Ming Li
Qilong Wu
Kaipeng Zhang
Chen Wei
LRM
261
19
0
10 Jul 2025
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Zhiyuan Liu
Yuting Zhang
Feng Liu
Changwang Zhang
Ying Sun
Jun Wang
LRM
469
21
0
20 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
365
198
0
17 Mar 2025
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Hengguang Zhou
Xirui Li
Ruochen Wang
Minhao Cheng
Tianyi Zhou
Cho-Jui Hsieh
OffRLLRMReLM
373
124
0
07 Mar 2025
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRMMLLM
649
34
0
27 May 2024
1