Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.02764
Cited By
Adaptive Preference Scaling for Reinforcement Learning with Human Feedback
4 June 2024
Ilgee Hong
Zichong Li
Alexander Bukharin
Yixiao Li
Haoming Jiang
Tianbao Yang
Tuo Zhao
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Adaptive Preference Scaling for Reinforcement Learning with Human Feedback"
4 / 4 papers shown
Title
Robust Reinforcement Learning from Human Feedback for Large Language Models Fine-Tuning
Kai Ye
Hongyi Zhou
Jin Zhu
Francesco Quinzan
C. Shi
20
0
0
03 Apr 2025
Stochastic Constrained DRO with a Complexity Independent of Sample Size
Q. Qi
Jiameng Lyu
Kung-Sik Chan
E. Bai
Tianbao Yang
50
15
0
11 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
303
11,730
0
04 Mar 2022
Semantically Distributed Robust Optimization for Vision-and-Language Inference
Tejas Gokhale
A. Chaudhary
Pratyay Banerjee
Chitta Baral
Yezhou Yang
46
17
0
14 Oct 2021
1