Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
All Papers
0 / 0 papers shown
Title
Home
Papers
2505.15436
Cited By
v1
v2
v3 (latest)
Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs
21 May 2025
Xintong Zhang
Zhi Gao
Bofei Zhang
Pengxiang Li
Xiaowen Zhang
Zehua Wang
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (1 upvotes)
Papers citing
"Adaptive Chain-of-Focus Reasoning via Dynamic Visual Search and Zooming for Efficient VLMs"
29 / 29 papers shown
Title
Reinforcement Learning for Large Model: A Survey
Weijia Wu
Chen Gao
Joya Chen
Kevin Lin
Qingwei Meng
Yiming Zhang
Yuke Qiu
Hong Zhou
Mike Zheng Shou
269
2
0
24 Dec 2025
Envision: Benchmarking Unified Understanding & Generation for Causal World Process Insights
Juanxi Tian
Siyuan Li
Conghui He
Lijun Wu
Cheng Tan
EGVM
VGen
120
0
0
01 Dec 2025
From Illusion to Intention: Visual Rationale Learning for Vision-Language Reasoning
C. Wang
Haozhe Wang
Xi Chen
J. Liu
Taofeng Xue
Chong Peng
Donglian Qi
Fangzhen Lin
Yunfeng Yan
OffRL
LRM
216
0
0
28 Nov 2025
Video Spatial Reasoning with Object-Centric 3D Rollout
Haoran Tang
Meng Cao
Ruyang Liu
Xiaoxi Liang
Linglong Li
Ge Li
Xiaodan Liang
LRM
111
0
0
17 Nov 2025
Zooming into Comics: Region-Aware RL Improves Fine-Grained Comic Understanding in Vision-Language Models
Yule Chen
Yufan Ren
Sabine Süsstrunk
VLM
84
0
0
09 Nov 2025
DeepEyesV2: Toward Agentic Multimodal Model
IEEE Transactions on Audio, Speech, and Language Processing (TASLP), 2025
Jack Hong
Chenxiao Zhao
ChengLin Zhu
Weiheng Lu
Guohai Xu
Xing Yu
122
3
0
07 Nov 2025
TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning
Ming Li
Jike Zhong
Shitian Zhao
H. Zhang
Shaoheng Lin
Yuxiang Lai
Chen Wei
Konstantinos Psounis
Kaipeng Zhang
EGVM
LRM
VLM
420
2
0
03 Nov 2025
ViPER: Empowering the Self-Evolution of Visual Perception Abilities in Vision-Language Model
J. Zhang
Song Jin
Chuanqi Cheng
Yuhan Liu
Yankai Lin
...
Yufei Zhang
F. Jiang
G. Yin
Wei Lin
Rui Yan
VLM
200
3
0
28 Oct 2025
VAR: Visual Attention Reasoning via Structured Search and Backtracking
Wei Cai
Jian Zhao
Yuchen Yuan
T. Zhang
Ming Zhu
Haichuan Tang
Chi Zhang
Xuelong Li
OffRL
LRM
108
0
0
21 Oct 2025
Select Less, Reason More: Prioritizing Evidence Purity for Video Reasoning
Xuchen Li
Xuzhao Li
Shiyu Hu
Kaiqi Huang
76
0
0
17 Oct 2025
RECODE: Reasoning Through Code Generation for Visual Question Answering
Junhong Shen
Mu Cai
Bo Hu
Ameet Talwalkar
David A. Ross
Cordelia Schmid
Alireza Fathi
ReLM
LRM
124
0
0
15 Oct 2025
Beyond Seeing: Evaluating Multimodal LLMs on Tool-Enabled Image Perception, Transformation, and Reasoning
Xingang Guo
Utkarsh Tyagi
Advait Gosai
Paula Vergara
Ernesto Gabriel Hernández Montoya
...
Bin Hu
Yunzhong He
Bing Liu
Bing Liu
Rakshith S Srinivasa
VLM
LRM
308
2
0
14 Oct 2025
A Survey on Agentic Multimodal Large Language Models
Huanjin Yao
Ruifei Zhang
Jiaxing Huang
Jingyi Zhang
Yibo Wang
...
Ruolin Zhu
Yongcheng Jing
Shunyu Liu
Guanbin Li
Dacheng Tao
LM&Ro
AIFin
AI4TS
LRM
AI4CE
221
4
0
13 Oct 2025
Latent Visual Reasoning
Bangzheng Li
Ximeng Sun
Jiang-Long Liu
Ze Wang
Jialian Wu
Xiaodong Yu
Hao Chen
Emad Barsoum
Muhao Chen
Zicheng Liu
LRM
VLM
192
5
0
29 Sep 2025
LOVE-R1: Advancing Long Video Understanding with an Adaptive Zoom-in Mechanism via Multi-Step Reasoning
Shenghao Fu
Q. Yang
Yuan-Ming Li
Xihan Wei
Xiaohua Xie
Wei-Shi Zheng
LRM
136
5
0
29 Sep 2025
DeFacto: Counterfactual Thinking with Images for Enforcing Evidence-Grounded and Faithful Reasoning
Tianrun Xu
Haoda Jing
Y. Li
Yuquan Wei
Jun Feng
Guanyu Chen
Haichuan Gao
Tianren Zhang
Feng Chen
OffRL
71
0
0
25 Sep 2025
GUI-ARP: Enhancing Grounding with Adaptive Region Perception for GUI Agents
Xianhang Ye
Yiqing Li
Wei Dai
Miancan Liu
Ziyuan Chen
...
Hongbo Min
Jinkui Ren
Xiantao Zhang
Wen Yang
Zhi Jin
132
3
0
19 Sep 2025
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search
Xin Lai
Junyi Li
Wei Li
Tao Liu
Tianjian Li
Hengshuang Zhao
LRM
VLM
97
25
0
09 Sep 2025
Learning Active Perception via Self-Evolving Preference Optimization for GUI Grounding
Wanfu Wang
Qipeng Huang
Guangquan Xue
Xiaobo Liang
Juntao Li
VLM
100
1
0
04 Sep 2025
Explain Before You Answer: A Survey on Compositional Visual Reasoning
Fucai Ke
Joy Hsu
Zhixi Cai
Zixian Ma
Xin Zheng
...
P. D. Haghighi
Gholamreza Haffari
Ranjay Krishna
Jiajun Wu
H. Rezatofighi
ReLM
CoGe
LRM
296
8
0
24 Aug 2025
edgeVLM: Cloud-edge Collaborative Real-time VLM based on Context Transfer
Chen Qian
Xinran Yu
Zewen Huang
Danyang Li
Qiang Ma
Fan Dang
X. Ding
Guangyong Shang
Zheng Yang
VLM
104
0
0
18 Aug 2025
Simple o3: Towards Interleaved Vision-Language Reasoning
Ye Wang
Qianglong Chen
Zejun Li
Siyuan Wang
Shijie Guo
Zhirui Zhang
Zhongyu Wei
MLLM
LRM
VLM
144
12
0
16 Aug 2025
Thyme: Think Beyond Images
Yi Zhang
Xingyu Lu
S. Yin
Chaoyou Fu
Wei Chen
...
Zhang Zhang
Liang Wang
Fan Yang
Tingting Gao
Guorui Zhou
LRM
VLM
152
33
0
15 Aug 2025
SIFThinker: Spatially-Aware Image Focus for Visual Reasoning
Zhangquan Chen
Ruihui Zhao
Chuwei Luo
Mingze Sun
Xinlei Yu
Yangyang Kang
Ruqi Huang
LRM
197
4
0
08 Aug 2025
PyVision: Agentic Vision with Dynamic Tooling
Shitian Zhao
H. Zhang
Shaoheng Lin
Ming Li
Qilong Wu
Kaipeng Zhang
Chen Wei
LRM
225
19
0
10 Jul 2025
OThink-MR1: Stimulating multimodal generalized reasoning capabilities via dynamic reinforcement learning
Zhiyuan Liu
Yuting Zhang
Feng Liu
Changwang Zhang
Ying Sun
Jun Wang
LRM
409
20
0
20 Mar 2025
R1-VL: Learning to Reason with Multimodal Large Language Models via Step-wise Group Relative Policy Optimization
Jingyi Zhang
Jiaxing Huang
Huanjin Yao
Shunyu Liu
Xikun Zhang
Shijian Lu
Dacheng Tao
LRM
345
193
0
17 Mar 2025
R1-Zero's "Aha Moment" in Visual Reasoning on a 2B Non-SFT Model
Hengguang Zhou
Xirui Li
Ruochen Wang
Minhao Cheng
Tianyi Zhou
Cho-Jui Hsieh
OffRL
LRM
ReLM
337
119
0
07 Mar 2025
VoCoT: Unleashing Visually Grounded Multi-Step Reasoning in Large Multi-Modal Models
Zejun Li
Ruipu Luo
Jiwen Zhang
Minghui Qiu
Zhongyu Wei
Zhongyu Wei
LRM
MLLM
609
31
0
27 May 2024
1