Recent advances in large vision-language models (LVLMs) have revealed an \textit{overthinking} phenomenon, where models generate verbose reasoning across all tasks regardless of questions. To address this issue, we present \textbf{FAST}, a novel \textbf{Fa}st-\textbf{S}low \textbf{T}hinking framework that dynamically adapts reasoning depth based on question characteristics. Through empirical analysis, we establish the feasibility of fast-slow thinking in LVLMs by investigating how response length and data distribution affect performance. We develop FAST-GRPO with three components: model-based metrics for question characterization, an adaptive thinking reward mechanism, and difficulty-aware KL regularization. Experiments across seven reasoning benchmarks demonstrate that FAST achieves state-of-the-art accuracy with over 10\% relative improvement compared to the base model, while reducing token usage by 32.7-67.3\% compared to previous slow-thinking approaches, effectively balancing reasoning length and accuracy.
View on arXiv@article{xiao2025_2504.18458, title={ Fast-Slow Thinking for Large Vision-Language Model Reasoning }, author={ Wenyi Xiao and Leilei Gan and Weilong Dai and Wanggui He and Ziwei Huang and Haoyuan Li and Fangxun Shu and Zhelun Yu and Peng Zhang and Hao Jiang and Fei Wu }, journal={arXiv preprint arXiv:2504.18458}, year={ 2025 } }