Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2408.11791
Cited By
Critique-out-Loud Reward Models
21 August 2024
Zachary Ankner
Mansheej Paul
Brandon Cui
Jonathan D. Chang
Prithviraj Ammanabrolu
ALM
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Critique-out-Loud Reward Models"
21 / 21 papers shown
Title
Process Reward Models That Think
Muhammad Khalifa
Rishabh Agarwal
Lajanugen Logeswaran
Jaekyeom Kim
Hao Peng
Moontae Lee
Honglak Lee
Lu Wang
OffRL
ALM
LRM
36
1
0
23 Apr 2025
AIMO-2 Winning Solution: Building State-of-the-Art Mathematical Reasoning Models with OpenMathReasoning dataset
Ivan Moshkov
Darragh Hanley
Ivan Sorokin
Shubham Toshniwal
Christof Henkel
Benedikt D. Schifferer
Wei Du
Igor Gitman
ReLM
LRM
35
1
0
23 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAG
LRM
52
2
0
14 Apr 2025
Fast Controlled Generation from Language Models with Adaptive Weighted Rejection Sampling
Benjamin Lipkin
Benjamin LeBrun
Jacob Hoover Vigly
João Loula
David R. MacIver
...
Ryan Cotterell
Vikash K. Mansinghka
Timothy J. O'Donnell
Alexander K. Lew
Tim Vieira
24
0
0
07 Apr 2025
Do LLM Evaluators Prefer Themselves for a Reason?
Wei-Lin Chen
Zhepei Wei
Xinyu Zhu
Shi Feng
Yu Meng
ELM
LRM
42
0
0
04 Apr 2025
Inference-Time Scaling for Generalist Reward Modeling
Zijun Liu
P. Wang
R. Xu
Shirong Ma
Chong Ruan
Peng Li
Yang Janet Liu
Y. Wu
OffRL
LRM
44
9
0
03 Apr 2025
When To Solve, When To Verify: Compute-Optimal Problem Solving and Generative Verification for LLM Reasoning
Nishad Singhi
Hritik Bansal
Arian Hosseini
Aditya Grover
Kai-Wei Chang
Marcus Rohrbach
Anna Rohrbach
OffRL
LRM
37
0
0
01 Apr 2025
Reasoning Beyond Limits: Advances and Open Problems for LLMs
M. Ferrag
Norbert Tihanyi
Merouane Debbah
ELM
OffRL
LRM
AI4CE
59
2
0
26 Mar 2025
Scaling Evaluation-time Compute with Reasoning Models as Process Evaluators
Seungone Kim
Ian Wu
Jinu Lee
Xiang Yue
Seongyun Lee
...
Kiril Gashteovski
Carolin (Haas) Lawrence
J. Hockenmaier
Graham Neubig
Sean Welleck
LRM
42
2
0
25 Mar 2025
Fine-Tuning Diffusion Generative Models via Rich Preference Optimization
Hanyang Zhao
Haoxian Chen
Yucheng Guo
Genta Indra Winata
Tingting Ou
Ziyu Huang
D. Yao
Wenpin Tang
50
0
0
13 Mar 2025
CodeCriticBench: A Holistic Code Critique Benchmark for Large Language Models
Alexander Zhang
Marcus Dong
J. H. Liu
W. Zhang
Yejie Wang
...
Yancheng He
K. Deng
Wangchunshu Zhou
Wenhao Huang
Z. Zhang
LRM
50
2
0
23 Feb 2025
Kimi k1.5: Scaling Reinforcement Learning with LLMs
Kimi Team
Angang Du
Bofei Gao
Bowei Xing
Changjiu Jiang
...
Zhilin Yang
Zhiqi Huang
Zihao Huang
Ziyao Xu
Z. Yang
VLM
ALM
OffRL
AI4TS
LRM
93
128
0
22 Jan 2025
Self-Generated Critiques Boost Reward Modeling for Language Models
Yue Yu
Zhengxing Chen
Aston Zhang
L Tan
Chenguang Zhu
...
Suchin Gururangan
Chao-Yue Zhang
Melanie Kambadur
Dhruv Mahajan
Rui Hou
LRM
ALM
78
14
0
25 Nov 2024
Asynchronous RLHF: Faster and More Efficient Off-Policy RL for Language Models
Michael Noukhovitch
Shengyi Huang
Sophie Xhonneux
Arian Hosseini
Rishabh Agarwal
Aaron C. Courville
OffRL
74
4
0
23 Oct 2024
RMB: Comprehensively Benchmarking Reward Models in LLM Alignment
Enyu Zhou
Guodong Zheng
B. Wang
Zhiheng Xi
Shihan Dou
...
Yurong Mou
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
52
13
0
13 Oct 2024
TICKing All the Boxes: Generated Checklists Improve LLM Evaluation and Generation
Jonathan Cook
Tim Rocktaschel
Jakob Foerster
Dennis Aumiller
Alex Wang
ALM
13
9
0
04 Oct 2024
AIME: AI System Optimization via Multiple LLM Evaluators
Bhrij Patel
Souradip Chakraborty
Wesley A. Suttle
Mengdi Wang
Amrit Singh Bedi
Dinesh Manocha
16
5
0
04 Oct 2024
Beyond Scalar Reward Model: Learning Generative Judge from Preference Data
Ziyi Ye
Xiangsheng Li
Qiuchi Li
Qingyao Ai
Yujia Zhou
Wei Shen
Dong Yan
Yiqun Liu
23
10
0
01 Oct 2024
Interpretable Preferences via Multi-Objective Reward Modeling and Mixture-of-Experts
Haoxiang Wang
Wei Xiong
Tengyang Xie
Han Zhao
Tong Zhang
44
13
0
18 Jun 2024
Improving Reward Models with Synthetic Critiques
Zihuiwen Ye
Fraser Greenlee-Scott
Max Bartolo
Phil Blunsom
Jon Ander Campos
Matthias Gallé
ALM
SyDa
LRM
25
16
0
31 May 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1