ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2301.12842
  4. Cited By
Direct Preference-based Policy Optimization without Reward Modeling

Direct Preference-based Policy Optimization without Reward Modeling

30 January 2023
Gaon An
Junhyeok Lee
Xingdong Zuo
Norio Kosaka
KyungHyun Kim
Hyun Oh Song
    OffRL
ArXivPDFHTML

Papers citing "Direct Preference-based Policy Optimization without Reward Modeling"

5 / 5 papers shown
Title
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
71
3
0
19 Mar 2025
Evaluation of Few-Shot Learning for Classification Tasks in the Polish
  Language
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya
D. Kajtoch
38
0
0
27 Apr 2024
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
203
627
0
12 Oct 2021
Uncertainty-Based Offline Reinforcement Learning with Diversified
  Q-Ensemble
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
Gaon An
Seungyong Moon
Jang-Hyun Kim
Hyun Oh Song
OffRL
92
261
0
04 Oct 2021
1