Communities
Connect sessions
AI calendar
Organizations
Contact Sales
Search
Open menu
Home
Papers
2509.26313
Cited By
One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient
30 September 2025
Rui Ming
Haoyuan Wu
Shoubo Hu
Zhuolun He
Bei Yu
OffRL
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (4 upvotes)
Papers citing
"One-Token Rollout: Guiding Supervised Fine-Tuning of LLMs with Policy Gradient"
Title
No papers found