ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.11250
39
0

Uncertainty-Aware Step-wise Verification with Generative Reward Models

16 February 2025
Zihuiwen Ye
L. Melo
Younesse Kaddar
Phil Blunsom
S. Kamath S
Yarin Gal
    LRM
ArXivPDFHTML
Abstract

Complex multi-step reasoning tasks, such as solving mathematical problems, remain challenging for large language models (LLMs). While outcome supervision is commonly used, process supervision via process reward models (PRMs) provides intermediate rewards to verify step-wise correctness in solution traces. However, as proxies for human judgement, PRMs suffer from reliability issues, including susceptibility to reward hacking. In this work, we propose leveraging uncertainty quantification (UQ) to enhance the reliability of step-wise verification with generative reward models for mathematical reasoning tasks. We introduce CoT Entropy, a novel UQ method that outperforms existing approaches in quantifying a PRM's uncertainty in step-wise verification. Our results demonstrate that incorporating uncertainty estimates improves the robustness of judge-LM PRMs, leading to more reliable verification.

View on arXiv
@article{ye2025_2502.11250,
  title={ Uncertainty-Aware Step-wise Verification with Generative Reward Models },
  author={ Zihuiwen Ye and Luckeciano Carvalho Melo and Younesse Kaddar and Phil Blunsom and Sam Staton and Yarin Gal },
  journal={arXiv preprint arXiv:2502.11250},
  year={ 2025 }
}
Comments on this paper