Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems

26 February 2025

Papers citing "Agentic Reward Modeling: Integrating Human Preferences with Verifiable Correctness Signals for Reliable Reward Systems"

3 / 3 papers shown

Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models Xiaobao Wu LRM 60 0 0 05 May 2025
A Desideratum for Conversational Agents: Capabilities, Challenges, and Future Directions Emre Can Acikgoz Cheng Qian Hongru Wang Vardhan Dongre X. Chen Heng Ji Dilek Hakkani-Tür Gökhan Tür LM&Ro ELM 41 1 0 07 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling Zijun Liu P. Wang R. Xu Shirong Ma Chong Ruan Peng Li Yang Janet Liu Y. Wu OffRL LRM 36 9 0 03 Apr 2025