v1v2v3v4 (latest)

The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward

9 September 2025

ArXiv (abs)PDF HTML HuggingFace (3 upvotes)Github (11★)

Papers citing "The Choice of Divergence: A Neglected Key to Mitigating Diversity Collapse in Reinforcement Learning with Verifiable Reward"

5 / 5 papers shown

Whatever Remains Must Be True: Filtering Drives Reasoning in LLMs, Shaping Diversity

272

05 Dec 2025

Information Gain-based Policy Optimization: A Simple and Effective Approach for Multi-Turn Search Agents

234

16 Oct 2025

Unlocking Exploration in RLVR: Uncertainty-aware Advantage Shaping for Deeper Reasoning

179

12 Oct 2025

Unlocking Reasoning Capabilities in LLMs via Reinforcement Learning Exploration

331

04 Oct 2025

ZeroTuning: Unlocking the Initial Token's Power to Enhance Large Language Models Without Training

522

16 May 2025