Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.17055
Cited By
Optimal Design for Reward Modeling in RLHF
22 October 2024
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Optimal Design for Reward Modeling in RLHF"
5 / 5 papers shown
Title
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
31
0
0
05 May 2025
Reasoning without Regret
Tarun Chitra
OffRL
LRM
23
0
0
14 Apr 2025
Active Learning for Direct Preference Optimization
B. Kveton
Xintong Li
Julian McAuley
Ryan Rossi
Jingbo Shang
Junda Wu
Tong Yu
50
1
0
03 Mar 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
63
0
0
31 Dec 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao
Xu Chu
Yasha Wang
LRM
36
6
0
10 Oct 2024
1