Improving Reward Models with Synthetic Critiques

31 May 2024

Papers citing "Improving Reward Models with Synthetic Critiques"

5 / 5 papers shown

Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models Xiaobao Wu LRM 60 0 0 05 May 2025
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization Hanyang Zhao Haoxian Chen Yucheng Guo Genta Indra Winata Tingting Ou Ziyu Huang D. Yao Wenpin Tang 54 0 0 13 Mar 2025
Uncertainty-Aware Step-wise Verification with Generative Reward Models Zihuiwen Ye L. Melo Younesse Kaddar Phil Blunsom S. Kamath S Yarin Gal LRM 44 0 0 16 Feb 2025
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation Jonathan Cook Tim Rocktaschel Jakob Foerster Dennis Aumiller Alex Wang ALM 29 9 0 04 Oct 2024
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 301 11,730 0 04 Mar 2022