ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.04549
  4. Cited By
Spontaneous Reward Hacking in Iterative Self-Refinement

Spontaneous Reward Hacking in Iterative Self-Refinement

5 July 2024
Jane Pan
He He
Samuel R. Bowman
Shi Feng
ArXivPDFHTML

Papers citing "Spontaneous Reward Hacking in Iterative Self-Refinement"

5 / 5 papers shown
Title
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
60
0
0
05 May 2025
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
Can AI writing be salvaged? Mitigating Idiosyncrasies and Improving Human-AI Alignment in the Writing Process through Edits
Tuhin Chakrabarty
Philippe Laban
C. Wu
45
8
0
22 Sep 2024
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization
Wenkai Yang
Shiqi Shen
Guangyao Shen
Zhi Gong
Yankai Lin
Zhi Gong
Yankai Lin
Ji-Rong Wen
41
13
0
17 Jun 2024
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions
  of Large Language Models with Suggest-Critique-Reflect Process
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
22
14
0
04 May 2023
Can Large Language Models Be an Alternative to Human Evaluations?
Can Large Language Models Be an Alternative to Human Evaluations?
Cheng-Han Chiang
Hung-yi Lee
ALM
LM&MA
204
559
0
03 May 2023
1