Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales

Terms and Conditions

Twitter GitHub LinkedIn Bluesky Youtube

© 2026 ResearchTrend.AI, All rights reserved.

Home
Papers
2407.04549
Cited By

Spontaneous Reward Hacking in Iterative Self-Refinement

Spontaneous Reward Hacking in Iterative Self-Refinement

5 July 2024

Samuel R. Bowman

Shi Feng

ArXiv (abs)PDF HTML Github

Papers citing "Spontaneous Reward Hacking in Iterative Self-Refinement"

10 / 10 papers shown

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

Enhancing Large Language Model Reasoning with Reward Models: An Analytical Survey

335

4

0

02 Oct 2025

Causally-Enhanced Reinforcement Policy Optimization

Causally-Enhanced Reinforcement Policy Optimization

Xiangliang Zhang

232

1

0

27 Sep 2025

Inference-Time Reward Hacking in Large Language Models

Inference-Time Reward Hacking in Large Language Models

Alex Oesterling

Himabindu Lakkaraju

Flavio du Pin Calmon

339

9

0

24 Jun 2025

Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards

Sailing by the Stars: A Survey on Reward Models and Learning Strategies for Learning from Rewards

770

5

0

05 May 2025

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation

AI-Slop to AI-Polish? Aligning Language Models through Edit-Based Writing Rewards and Test-time Computation

Tuhin Chakrabarty

571

16

0

10 Apr 2025

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations

PRD: Peer Rank and Discussion Improve Large Language Model based Evaluations

688

133

0

03 Jan 2025

Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest
Models Reward Hack

Honesty to Subterfuge: In-Context Reinforcement Learning Can Make Honest Models Reward Hack

Christoph Sträter

Maria Angelica Martinez

246

11

0

09 Oct 2024

Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits

Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through EditsInternational Conference on Human Factors in Computing Systems (CHI), 2024

Tuhin Chakrabarty

699

48

0

22 Sep 2024

Diversify and Conquer: Diversity-Centric Data Selection with Iterative
Refinement

Diversify and Conquer: Diversity-Centric Data Selection with Iterative Refinement

276

11

0

17 Sep 2024

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization

Shiqi Shen

Zhi Gong

Yankai Lin

Ji-Rong Wen

430

20

0

17 Jun 2024

Page 1 of 1