Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2212.04717
Cited By
On the Sensitivity of Reward Inference to Misspecified Human Models
9 December 2022
Joey Hong
Kush S. Bhatia
Anca Dragan
Re-assign community
ArXiv
PDF
HTML
Papers citing
"On the Sensitivity of Reward Inference to Misspecified Human Models"
5 / 5 papers shown
Title
Taming Overconfidence in LLMs: Reward Calibration in RLHF
Jixuan Leng
Chengsong Huang
Banghua Zhu
Jiaxin Huang
26
7
0
13 Oct 2024
Boosting Reward Model with Preference-Conditional Multi-Aspect Synthetic Data Generation
Jiaming Shen
Ran Xu
Yennie Jun
Zhen Qin
Tianqi Liu
Carl Yang
Yi Liang
Simon Baumgartner
Michael Bendersky
SyDa
57
4
0
22 Jul 2024
Social Choice Should Guide AI Alignment in Dealing with Diverse Human Feedback
Vincent Conitzer
Rachel Freedman
J. Heitzig
Wesley H. Holliday
Bob M. Jacobs
...
Eric Pacuit
Stuart Russell
Hailey Schoelkopf
Emanuel Tewolde
W. Zwicker
33
28
0
16 Apr 2024
Active teacher selection for reinforcement learning from human feedback
Rachel Freedman
Justin Svegliato
K. H. Wray
Stuart J. Russell
31
6
0
23 Oct 2023
Towards Understanding Sycophancy in Language Models
Mrinank Sharma
Meg Tong
Tomasz Korbak
D. Duvenaud
Amanda Askell
...
Oliver Rausch
Nicholas Schiefer
Da Yan
Miranda Zhang
Ethan Perez
211
178
0
20 Oct 2023
1