ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.17055
  4. Cited By
Optimal Design for Reward Modeling in RLHF

Optimal Design for Reward Modeling in RLHF

22 October 2024
Antoine Scheid
Etienne Boursier
Alain Durmus
Michael I. Jordan
Pierre Ménard
Eric Moulines
Michal Valko
    OffRL
ArXivPDFHTML

Papers citing "Optimal Design for Reward Modeling in RLHF"

5 / 5 papers shown
Title
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
31
0
0
05 May 2025
Reasoning without Regret
Reasoning without Regret
Tarun Chitra
OffRL
LRM
23
0
0
14 Apr 2025
Active Learning for Direct Preference Optimization
B. Kveton
Xintong Li
Julian McAuley
Ryan Rossi
Jingbo Shang
Junda Wu
Tong Yu
50
1
0
03 Mar 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRL
ELM
63
0
0
31 Dec 2024
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
TPO: Aligning Large Language Models with Multi-branch & Multi-step Preference Trees
Weibin Liao
Xu Chu
Yasha Wang
LRM
36
6
0
10 Oct 2024
1