Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |
![]() Moral Foundations of Large Language Models Marwa Abdulhai Gregory Serapio-Garcia Clément Crepy Daria Valter John Canny Natasha Jaques |