OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation
- LRM
Human cognition operates through two complementary modes: fast intuitive thinking and slow deliberate thinking. Vanilla large language models (LLMs) predominantly follow the fast-thinking paradigm, producing immediate responses; while recent large reasoning models (LRMs) adopt slow-thinking strategies, generating detailed reasoning chains before arriving at answers. While LRMs often achieve higher accuracy, this comes at the cost of substantially increased token usage. To address this efficiency-accuracy trade-off, we propose OThink-R1, a hybrid reasoning framework that integrates both modes within a single LRM and enables automatic mode switching based on problem characteristics. We first identify three major patterns of essential and redundant reasoning trajectories in LRMs, which guide the design of an auxiliary LLM-based judge that adaptively determines when slow thinking is necessary. Leveraging the judge's decisions, we construct a hybrid fine-tuning dataset by pruning redundant reasoning to produce fast-thinking samples and retaining complete reasoning for slow-thinking samples. This dataset is then used to fine-tune LRMs, equipping them with inherent autonomous mode-selection capabilities. Extensive experiments on mathematical and question-answering benchmarks show that OThink-R1 reduces reasoning token usage significantly while maintaining competitive accuracy. The code is available atthis https URL.
View on arXiv