Papers
Communities
Organizations
Events
Blog
Pricing
Feedback
Contact Sales
Search
Open menu
Home
Papers
2406.01013
Cited By
v1
v2 (latest)
Scalable Ensembling For Mitigating Reward Overoptimisation
3 June 2024
Ahmed M. Ahmed
Rafael Rafailov
Stepan Sharkov
Xuechen Li
Oluwasanmi Koyejo
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scalable Ensembling For Mitigating Reward Overoptimisation"
4 / 4 papers shown
Title
Ensemble Learning for Large Language Models in Text and Code Generation: A Survey
Mari Ashiga
Wei Jie
Fan Wu
Vardan K. Voskanyan
Fateme Dinmohammadi
P. Brookes
Jingzhi Gong
Zheng Wang
135
1
0
13 Mar 2025
Reward Shaping to Mitigate Reward Hacking in RLHF
Jiayi Fu
Xuandong Zhao
Chengyuan Yao
Han Wang
Qi Han
Yanghua Xiao
260
22
0
26 Feb 2025
Rethinking Diverse Human Preference Learning through Principal Component Analysis
Feng Luo
Rui Yang
Hao Sun
Chunyuan Deng
Jiarui Yao
Jingyan Shen
Huan Zhang
Hanjie Chen
135
3
0
18 Feb 2025
Ensembles of Low-Rank Expert Adapters
Yinghao Li
Vianne Gao
Chao Zhang
MohamadAli Torkamani
217
1
0
31 Jan 2025
1