Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |
![]() From Yes-Men to Truth-Tellers: Addressing Sycophancy in Large Language Models with Pinpoint Tuning Wei Chen Zhen Huang Liang Xie Binbin Lin Houqiang Li ...Deng Cai Yonggang Zhang Wenxiao Wang Xu Shen Jieping Ye |
![]() Bridging and Modeling Correlations in Pairwise Data for Direct Preference Optimization Yuxin Jiang Bo Huang Yufei Wang Xingshan Zeng Liangyou Li Yasheng Wang Xin Jiang Lifeng Shang Ruiming Tang Wei Wang |