ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2510.08049
4
0

A Survey of Process Reward Models: From Outcome Signals to Process Supervisions for Large Language Models

9 October 2025
Congming Zheng
Jiachen Zhu
Zhuoying Ou
Yuxiang Chen
Kangning Zhang
Rong Shan
Zeyu Zheng
Mengyue Yang
Jianghao Lin
Yong Yu
Weinan Zhang
    LRM
ArXiv (abs)PDFHTML
Main:1 Pages
3 Figures
Appendix:13 Pages
Abstract

Although Large Language Models (LLMs) exhibit advanced reasoning ability, conventional alignment remains largely dominated by outcome reward models (ORMs) that judge only final answers. Process Reward Models(PRMs) address this gap by evaluating and guiding reasoning at the step or trajectory level. This survey provides a systematic overview of PRMs through the full loop: how to generate process data, build PRMs, and use PRMs for test-time scaling and reinforcement learning. We summarize applications across math, code, text, multimodal reasoning, robotics, and agents, and review emerging benchmarks. Our goal is to clarify design spaces, reveal open challenges, and guide future research toward fine-grained, robust reasoning alignment.

View on arXiv
Comments on this paper