Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.06533
Cited By
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
10 February 2025
Jean Vassoyan
Nathanaël Beau
Roman Plaud
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning"
1 / 1 papers shown
Title
Entropy-guided sequence weighting for efficient exploration in RL-based LLM fine-tuning
Abdullah Vanlioglu
46
0
0
28 Mar 2025
1