Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.16455
Cited By
On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization
26 May 2024
Jiancong Xiao
Ziniu Li
Xingyu Xie
E. Getzen
Cong Fang
Qi Long
Weijie J. Su
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Algorithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization"
7 / 7 papers shown
Title
Restoring Calibration for Aligned Large Language Models: A Calibration-Aware Fine-Tuning Approach
Jiancong Xiao
Bojian Hou
Zhanliang Wang
Ruochen Jin
Q. Long
Weijie Su
Li Shen
26
0
0
04 May 2025
Contextual Online Uncertainty-Aware Preference Learning for Human Feedback
Nan Lu
Ethan X. Fang
Junwei Lu
46
0
0
27 Apr 2025
Towards Large Language Models that Benefit for All: Benchmarking Group Fairness in Reward Models
Kefan Song
Jin Yao
Runnan Jiang
Rohan Chandra
Shangtong Zhang
ALM
36
0
0
10 Mar 2025
Asymptotics of Language Model Alignment
Joy Qiping Yang
Salman Salamatian
Ziteng Sun
A. Suresh
Ahmad Beirami
61
21
0
02 Apr 2024
Monitoring AI-Modified Content at Scale: A Case Study on the Impact of ChatGPT on AI Conference Peer Reviews
Weixin Liang
Zachary Izzo
Yaohui Zhang
Haley Lepp
Hancheng Cao
...
Haotian Ye
Sheng Liu
Zhi Huang
Daniel A. McFarland
James Y. Zou
DeLMO
60
14
0
11 Mar 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen
Chen Zhu
Davit Soselia
Jiuhai Chen
Tianyi Zhou
Tom Goldstein
Heng-Chiao Huang
M. Shoeybi
Bryan Catanzaro
AAML
32
20
0
11 Feb 2024
Sparks of Artificial General Intelligence: Early experiments with GPT-4
Sébastien Bubeck
Varun Chandrasekaran
Ronen Eldan
J. Gehrke
Eric Horvitz
...
Scott M. Lundberg
Harsha Nori
Hamid Palangi
Marco Tulio Ribeiro
Yi Zhang
ELM
AI4MH
AI4CE
ALM
197
2,232
0
22 Mar 2023
1