Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2403.05171
Cited By
Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation
8 March 2024
Xiaoying Zhang
Jean-François Ton
Wei Shen
Hongning Wang
Yang Liu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Overcoming Reward Overoptimization via Adversarial Policy Optimization with Lightweight Uncertainty Estimation"
6 / 6 papers shown
Title
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
74
0
0
17 Apr 2025
Inverse-RLignment: Large Language Model Alignment from Demonstrations through Inverse Reinforcement Learning
Hao Sun
M. Schaar
87
14
0
28 Jan 2025
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
21
42
0
14 Jun 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
61
32
0
30 Dec 2023
A Kernel-Based View of Language Model Fine-Tuning
Sadhika Malladi
Alexander Wettig
Dingli Yu
Danqi Chen
Sanjeev Arora
VLM
66
60
0
11 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,881
0
04 Mar 2022
1