Title |
---|
![]() RRM: Robust Reward Model Training Mitigates Reward Hacking Tianqi Liu Wei Xiong Jie Jessie Ren Lichang Chen Junru Wu ...Yuan Liu Bilal Piot Abe Ittycheriah Aviral Kumar Mohammad Saleh |
![]() RewardBench: Evaluating Reward Models for Language Modeling Nathan Lambert Valentina Pyatkin Jacob Morrison Lester James Validad Miranda Bill Yuchen Lin ...Sachin Kumar Tom Zick Yejin Choi Noah A. Smith Hanna Hajishirzi |
![]() Tur[k]ingBench: A Challenge Benchmark for Web Agents Kevin Xu Yeganeh Kordi Kate Sanders Yizhong Wang Adam Byerly Kate Sanders Adam Byerly Jingyu Zhang Benjamin Van Durme Daniel Khashabi |
![]() The Language Barrier: Dissecting Safety Challenges of LLMs in
Multilingual Contexts Lingfeng Shen Weiting Tan Sihao Chen Yunmo Chen Jingyu Zhang Haoran Xu Boyuan Zheng Philipp Koehn Daniel Khashabi |