Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.04274
Cited By
Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models
6 June 2024
Xiang Ji
Sanjeev Kulkarni
Mengdi Wang
Tengyang Xie
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Self-Play with Adversarial Critic: Provable and Scalable Offline Alignment for Language Models"
6 / 6 papers shown
Title
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences
Corby Rosset
Ching-An Cheng
Arindam Mitra
Michael Santacroce
Ahmed Hassan Awadallah
Tengyang Xie
144
113
0
04 Apr 2024
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
32
14
0
10 Feb 2024
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
102
92
0
22 Jan 2024
Human-in-the-loop: Provably Efficient Preference-based Reinforcement Learning with General Function Approximation
Xiaoyu Chen
Han Zhong
Zhuoran Yang
Zhaoran Wang
Liwei Wang
118
59
0
23 May 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Offline Reinforcement Learning: Tutorial, Review, and Perspectives on Open Problems
Sergey Levine
Aviral Kumar
George Tucker
Justin Fu
OffRL
GP
329
1,944
0
04 May 2020
1