ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.04549
  4. Cited By
Spontaneous Reward Hacking in Iterative Self-Refinement

Spontaneous Reward Hacking in Iterative Self-Refinement

5 July 2024
Jane Pan
He He
Samuel R. Bowman
Shi Feng
ArXiv (abs)PDFHTMLGithub

Papers citing "Spontaneous Reward Hacking in Iterative Self-Refinement"

10 / 10 papers shown
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey
Qiyuan Liu
Hao Xu
Xuhong Chen
Wei Chen
Yee Whye Teh
Ning Miao
ReLMLRMAI4CE
335
4
0
02 Oct 2025
Causally-Enhanced Reinforcement Policy Optimization
Causally-Enhanced Reinforcement Policy Optimization
Xiangqi Wang
Yue Huang
Yujun Zhou
Xiaonan Luo
Kehan Guo
Xiangliang Zhang
OffRLLRM
232
1
0
27 Sep 2025
Inference-Time Reward Hacking in Large Language Models
Inference-Time Reward Hacking in Large Language Models
Hadi Khalaf
C. M. Verdun
Alex Oesterling
Himabindu Lakkaraju
Flavio du Pin Calmon
339
9
0
24 Jun 2025
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards
Xiaobao Wu
LRM
770
5
0
05 May 2025
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation
Tuhin Chakrabarty
Philippe Laban
Chien-Sheng Wu
571
16
0
10 Apr 2025
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations
Ruosen Li
Teerth Patel
Xinya Du
LLMAGALM
688
133
0
03 Jan 2025
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest
  Models Reward Hack
Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack
Leo McKee-Reid
Christoph Sträter
Maria Angelica Martinez
Joe Needham
Mikita Balesni
OffRL
246
11
0
09 Oct 2024
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through EditsInternational Conference on Human Factors in Computing Systems (CHI), 2024
Tuhin Chakrabarty
Philippe Laban
Chien-Sheng Wu
699
48
0
22 Sep 2024
Diversify and Conquer: Diversity-Centric Data Selection with Iterative
  Refinement
Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement
Simon Yu
Liangyu Chen
Sara Ahmadian
Marzieh Fadaee
276
11
0
17 Sep 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
Shiqi Shen
Guangyao Shen
Zhi Gong
Yankai Lin
Zhi Gong
Yankai Lin
Ji-Rong Wen
430
20
0
17 Jun 2024
1
Page 1 of 1