Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2301.12842
Cited By
Direct Preference-based Policy Optimization without Reward Modeling
30 January 2023
Gaon An
Junhyeok Lee
Xingdong Zuo
Norio Kosaka
KyungHyun Kim
Hyun Oh Song
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Direct Preference-based Policy Optimization without Reward Modeling"
5 / 5 papers shown
Title
Aligning Crowd-sourced Human Feedback for Reinforcement Learning on Code Generation by Large Language Models
M. Wong
C. Tan
ALM
76
3
0
19 Mar 2025
Evaluation of Few-Shot Learning for Classification Tasks in the Polish Language
Tsimur Hadeliya
D. Kajtoch
38
0
0
27 Apr 2024
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Offline Reinforcement Learning with Implicit Q-Learning
Ilya Kostrikov
Ashvin Nair
Sergey Levine
OffRL
206
627
0
12 Oct 2021
Uncertainty-Based Offline Reinforcement Learning with Diversified Q-Ensemble
Gaon An
Seungyong Moon
Jang-Hyun Kim
Hyun Oh Song
OffRL
95
261
0
04 Oct 2021
1