COPR: Continual Learning Human Preference through Optimal Policy
Regularization

COPR: Continual Learning Human Preference through Optimal Policy Regularization

24 October 2023

Ruifeng Xu

Papers citing "COPR: Continual Learning Human Preference through Optimal Policy Regularization"

4 / 4 papers shown

Title
Stabilizing RLHF through Advantage Model and Selective Rehearsal Baolin Peng Linfeng Song Ye Tian Lifeng Jin Haitao Mi Dong Yu 25 17 0 18 Sep 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned Deep Ganguli Liane Lovitt John Kernion Amanda Askell Yuntao Bai ... Nicholas Joseph Sam McCandlish C. Olah Jared Kaplan Jack Clark 218 441 0 23 Aug 2022
$Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information$ Understanding Dataset Difficulty with $\mathcal{V}$ -Usable Information Kawin Ethayarajh Yejin Choi Swabha Swayamdipta 154 157 0 16 Oct 2021
Adversarial Continual Learning Sayna Ebrahimi Franziska Meier Roberto Calandra Trevor Darrell Marcus Rohrbach CLL VLM 140 195 0 21 Mar 2020