Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.13210
Cited By
Bayesian Reward Models for LLM Alignment
20 February 2024
Adam X. Yang
Maxime Robeyns
Thomas Coste
Zhengyan Shi
Jun Wang
Haitham Bou-Ammar
Laurence Aitchison
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Bayesian Reward Models for LLM Alignment"
8 / 8 papers shown
Title
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
36
0
0
17 Apr 2025
Regularizing Hidden States Enables Learning Generalizable Reward Model for LLMs
Rui Yang
Ruomeng Ding
Yong Lin
Huan Zhang
Tong Zhang
19
42
0
14 Jun 2024
Asymptotics of Language Model Alignment
Joy Qiping Yang
Salman Salamatian
Ziteng Sun
A. Suresh
Ahmad Beirami
61
21
0
02 Apr 2024
ODIN: Disentangled Reward Mitigates Hacking in RLHF
Lichang Chen
Chen Zhu
Davit Soselia
Jiuhai Chen
Tianyi Zhou
Tom Goldstein
Heng-Chiao Huang
M. Shoeybi
Bryan Catanzaro
AAML
42
51
0
11 Feb 2024
WARM: On the Benefits of Weight Averaged Reward Models
Alexandre Ramé
Nino Vieillard
Léonard Hussenot
Robert Dadashi
Geoffrey Cideron
Olivier Bachem
Johan Ferret
92
92
0
22 Jan 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
55
31
0
30 Dec 2023
Accelerated Linearized Laplace Approximation for Bayesian Deep Learning
Zhijie Deng
Feng Zhou
Jun Zhu
BDL
34
13
0
23 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
301
11,730
0
04 Mar 2022
1