Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.06886
Cited By
Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF
10 February 2024
Han Shen
Zhuoran Yang
Tianyi Chen
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Principled Penalty-based Methods for Bilevel Reinforcement Learning and RLHF"
13 / 13 papers shown
Title
Mitigating Forgetting in LLM Supervised Fine-Tuning and Preference Learning
H. Fernando
Han Shen
Parikshit Ram
Yi Zhou
Horst Samulowitz
Nathalie Baracaldo
Tianyi Chen
CLL
50
2
0
20 Oct 2024
Learning to Steer Markovian Agents under Model Uncertainty
Jiawei Huang
Vinzenz Thoma
Zebang Shen
H. Nax
Niao He
21
2
0
14 Jul 2024
SAIL: Self-Improving Efficient Online Alignment of Large Language Models
Mucong Ding
Souradip Chakraborty
Vibhu Agrawal
Zora Che
Alec Koppel
Mengdi Wang
Amrit Singh Bedi
Furong Huang
20
9
0
21 Jun 2024
Bilevel reinforcement learning via the development of hyper-gradient without lower-level convexity
Yan Yang
Bin Gao
Ya-xiang Yuan
30
2
0
30 May 2024
First-order penalty methods for bilevel optimization
Zhaosong Lu
Sanyou Mei
53
31
0
04 Jan 2023
BOME! Bilevel Optimization Made Easy: A Simple First-Order Approach
Mao Ye
B. Liu
S. Wright
Peter Stone
Qian Liu
69
82
0
19 Sep 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Policy Mirror Descent for Reinforcement Learning: Linear Convergence, New Sampling Complexity, and Generalized Problem Classes
Guanghui Lan
81
135
0
30 Jan 2021
Independent Policy Gradient Methods for Competitive Reinforcement Learning
C. Daskalakis
Dylan J. Foster
Noah Golowich
48
158
0
11 Jan 2021
A Finite Time Analysis of Two Time-Scale Actor Critic Methods
Yue Wu
Weitong Zhang
Pan Xu
Quanquan Gu
85
145
0
04 May 2020
Bilevel Programming for Hyperparameter Optimization and Meta-Learning
Luca Franceschi
P. Frasconi
Saverio Salzo
Riccardo Grazzi
Massimiliano Pontil
96
714
0
13 Jun 2018
Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks
Chelsea Finn
Pieter Abbeel
Sergey Levine
OOD
234
11,568
0
09 Mar 2017
Forward and Reverse Gradient-Based Hyperparameter Optimization
Luca Franceschi
Michele Donini
P. Frasconi
Massimiliano Pontil
109
370
0
06 Mar 2017
1