Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.15360
Cited By
Reward-Robust RLHF in LLMs
18 September 2024
Yuzi Yan
Xingzhou Lou
Jialian Li
Yiping Zhang
Jian Xie
Chao Yu
Yu Wang
Dong Yan
Yuan Shen
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Reward-Robust RLHF in LLMs"
3 / 3 papers shown
Title
Energy-Based Reward Models for Robust Language Model Alignment
Anamika Lochab
Ruqi Zhang
48
0
0
17 Apr 2025
Adversarial Training of Reward Models
Alexander Bukharin
Haifeng Qian
Shengyang Sun
Adithya Renduchintala
Soumye Singhal
Z. Wang
Oleksii Kuchaiev
Olivier Delalleau
T. Zhao
AAML
29
0
0
08 Apr 2025
Probabilistic Uncertain Reward Model
Wangtao Sun
Xiang Cheng
Xing Yu
Haotian Xu
Zhao Yang
Shizhu He
Jun Zhao
Kang Liu
56
0
0
28 Mar 2025
1