Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning

10 February 2025

Papers citing "Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning"

1 / 1 papers shown

Title
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning Abdullah Vanlioglu 46 0 0 28 Mar 2025