Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2407.04549
Cited By
Spontaneous Reward Hacking in Iterative Self-Refinement
5 July 2024
Jane Pan
He He
Samuel R. Bowman
Shi Feng
Re-assign community
ArXiv (abs)
PDF
HTML
Github
Papers citing
"Spontaneous Reward Hacking in Iterative Self-Refinement"
10 / 10 papers shown
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLM
LRM
AI4CE
335
4
0
02 Oct 2025
Causally-Enhanced Reinforcement Policy Optimization
Xiangqi Wang
Yue Huang
Yujun Zhou
Xiaonan Luo
Kehan Guo
Xiangliang Zhang
OffRL
LRM
232
1
0
27 Sep 2025
Inference-Time Reward Hacking in Large Language Models
Hadi Khalaf
C. M. Verdun
Alex Oesterling
Himabindu Lakkaraju
Flavio du Pin Calmon
339
9
0
24 Jun 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
770
5
0
05 May 2025
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
Tuhin Chakrabarty
Philippe Laban
Chien-Sheng Wu
571
16
0
10 Apr 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAG
ALM
688
133
0
03 Jan 2025
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Leo McKee-Reid
Christoph Sträter
Maria Angelica Martinez
Joe Needham
Mikita Balesni
OffRL
246
11
0
09 Oct 2024
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
International Conference on Human Factors in Computing Systems (CHI), 2024
Tuhin Chakrabarty
Philippe Laban
Chien-Sheng Wu
699
48
0
22 Sep 2024
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
Simon Yu
Liangyu Chen
Sara Ahmadian
Marzieh Fadaee
276
11
0
17 Sep 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
Shiqi Shen
Guangyao Shen
Zhi Gong
Yankai Lin
Zhi Gong
Yankai Lin
Ji-Rong Wen
430
20
0
17 Jun 2024
1
Page 1 of 1