Stabilizing RLHF through Advantage Model and Selective Rehearsal

18 September 2023

Linfeng Song

Dong Yu

ArXiv (abs)PDF HTML HuggingFace (11 upvotes)

Papers citing "Stabilizing RLHF through Advantage Model and Selective Rehearsal"

10 / 10 papers shown

Mapping Post-Training Forgetting in Language Models at Scale

153

20 Oct 2025

HSCR: Hierarchical Self-Contrastive Rewarding for Aligning Medical Vision Language ModelsAnnual Meeting of the Association for Computational Linguistics (ACL), 2025

214

01 Jun 2025

Improving LLM General Preference Alignment via Optimistic Online Mirror Descent

327

24 Feb 2025

Modality-Fair Preference Optimization for Trustworthy MLLM AlignmentInternational Joint Conference on Artificial Intelligence (IJCAI), 2024

316

20 Oct 2024

Systematic Evaluation of LLM-as-a-Judge in LLM Alignment Tasks: Explainable Metrics and Diverse Prompt Templates

503

23 Aug 2024

Iterative Nash Policy Optimization: Aligning LLMs with General Preferences via No-Regret Learning

Linfeng Song

Nan Jiang

533

30 Jun 2024

Dense Reward for Free in Reinforcement Learning from Human Feedback

268

01 Feb 2024

Enabling Language Models to Implicitly Learn Self-Improvement

Heng Ji

279

02 Oct 2023

Reward Engineering for Generating Semi-structured ExplanationFindings (Findings), 2023

Jiuzhou Han

Wray Buntine

Ehsan Shareghi

LRM

154

15 Sep 2023

Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language ModelsInternational Conference on Learning Representations (ICLR), 2023

Faeze Brahman

347

24 May 2023