ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.15694
  4. Cited By
COPR: Continual Learning Human Preference through Optimal Policy
  Regularization

COPR: Continual Learning Human Preference through Optimal Policy Regularization

24 October 2023
Han Zhang
Lin Gui
Yuanzhao Zhai
Hui Wang
Yu Lei
Ruifeng Xu
    CLL
ArXivPDFHTML

Papers citing "COPR: Continual Learning Human Preference through Optimal Policy Regularization"

4 / 4 papers shown
Title
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Stabilizing RLHF through Advantage Model and Selective Rehearsal
Baolin Peng
Linfeng Song
Ye Tian
Lifeng Jin
Haitao Mi
Dong Yu
25
17
0
18 Sep 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Understanding Dataset Difficulty with $\mathcal{V}$-Usable Information
Understanding Dataset Difficulty with V\mathcal{V}V-Usable Information
Kawin Ethayarajh
Yejin Choi
Swabha Swayamdipta
154
157
0
16 Oct 2021
Adversarial Continual Learning
Adversarial Continual Learning
Sayna Ebrahimi
Franziska Meier
Roberto Calandra
Trevor Darrell
Marcus Rohrbach
CLL
VLM
140
195
0
21 Mar 2020
1