Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2305.17608
Cited By
Reward Collapse in Aligning Large Language Models
28 May 2023
Ziang Song
Tianle Cai
Jason D. Lee
Weijie J. Su
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reward Collapse in Aligning Large Language Models"
21 / 21 papers shown
Title
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
58
0
0
27 Apr 2025
Simplify RLHF as Reward-Weighted SFT: A Variational Method
Yuhao Du
Z. Li
Pengyu Cheng
Zhihong Chen
Yuejiao Xie
Xiang Wan
Anningzhe Gao
35
1
0
20 Feb 2025
Elephant in the Room: Unveiling the Impact of Reward Model Quality in Alignment
Yan Liu
Xiaoyuan Yi
Xiaokang Chen
Jing Yao
Jingwei Yi
Daoguang Zan
Zheng Liu
Xing Xie
Tsung-Yi Ho
ALM
26
0
0
26 Sep 2024
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
Ilgee Hong
Zichong Li
Alexander Bukharin
Yixiao Li
Haoming Jiang
Tianbao Yang
Tuo Zhao
18
4
0
04 Jun 2024
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
Jiancong Xiao
Ziniu Li
Xingyu Xie
E. Getzen
Cong Fang
Qi Long
Weijie J. Su
41
10
0
26 May 2024
Hummer: Towards Limited Competitive Preference Dataset
Li Jiang
Yusen Wu
Junwu Xiong
Jingqing Ruan
Yichuan Ding
Qingpei Guo
Zujie Wen
Jun Zhou
Xiaotie Deng
26
6
0
19 May 2024
Fine-tuning of diffusion models via stochastic control: entropy regularization and beyond
Wenpin Tang
30
13
0
10 Mar 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
21
22
0
29 Jan 2024
Beyond Imitation: Leveraging Fine-grained Quality Signals for Alignment
Geyang Guo
Ranchi Zhao
Tianyi Tang
Wayne Xin Zhao
Ji-Rong Wen
ALM
24
27
0
07 Nov 2023
Prevalence and prevention of large language model use in crowd work
V. Veselovsky
Manoel Horta Ribeiro
Philip Cozzolino
Andrew Gordon
David Rothschild
Robert West
11
10
0
24 Oct 2023
Improving Generalization of Alignment with Human Preferences through Group Invariant Learning
Rui Zheng
Wei Shen
Yuan Hua
Wenbin Lai
Shihan Dou
...
Xiao Wang
Haoran Huang
Tao Gui
Qi Zhang
Xuanjing Huang
52
14
0
18 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
16
45
0
16 Oct 2023
Beyond Reverse KL: Generalizing Direct Preference Optimization with Diverse Divergence Constraints
Chaoqi Wang
Yibo Jiang
Yuguang Yang
Han Liu
Yuxin Chen
17
81
0
28 Sep 2023
Mitigating the Alignment Tax of RLHF
Yong Lin
Hangyu Lin
Wei Xiong
Shizhe Diao
Zeming Zheng
...
Han Zhao
Nan Jiang
Heng Ji
Yuan Yao
Tong Zhang
MoMe
CLL
13
40
0
12 Sep 2023
Peering Through Preferences: Unraveling Feedback Acquisition for Aligning Large Language Models
Hritik Bansal
John Dang
Aditya Grover
ALM
14
20
0
30 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
28
436
0
27 Jul 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
20
9
0
24 May 2023
Representation Deficiency in Masked Language Modeling
Yu Meng
Jitin Krishnan
Sinong Wang
Qifan Wang
Yuning Mao
Han Fang
Marjan Ghazvininejad
Jiawei Han
Luke Zettlemoyer
35
7
0
04 Feb 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
218
441
0
23 Aug 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
275
1,561
0
18 Sep 2019
1