ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.22233
30
0

Process Reward Modeling with Entropy-Driven Uncertainty

28 March 2025
Lang Cao
Renhong Chen
Yingtian Zou
Chao Peng
Wu Ning
Huacong Xu
Qian Chen
Y. Wang
Peishuo Su
Mofan Peng
Zijie Chen
Yitong Li
ArXivPDFHTML
Abstract

This paper presents the Entropy-Driven Unified Process Reward Model (EDU-PRM), a novel framework that approximates state-of-the-art performance in process supervision while drastically reducing training costs. EDU-PRM introduces an entropy-guided dynamic step partitioning mechanism, using logit distribution entropy to pinpoint high-uncertainty regions during token generation dynamically. This self-assessment capability enables precise step-level feedback without manual fine-grained annotation, addressing a critical challenge in process supervision. Experiments on the Qwen2.5-72B model with only 7,500 EDU-PRM-generated training queries demonstrate accuracy closely approximating the full Qwen2.5-72B-PRM (71.1% vs. 71.6%), achieving a 98% reduction in query cost compared to prior methods. This work establishes EDU-PRM as an efficient approach for scalable process reward model training.

View on arXiv
@article{cao2025_2503.22233,
  title={ Process Reward Modeling with Entropy-Driven Uncertainty },
  author={ Lang Cao and Renhong Chen and Yingtian Zou and Chao Peng and Wu Ning and Huacong Xu and Qian Chen and Yuxian Wang and Peishuo Su and Mofan Peng and Zijie Chen and Yitong Li },
  journal={arXiv preprint arXiv:2503.22233},
  year={ 2025 }
}
Comments on this paper