ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.13085
  4. Cited By
Defining and Characterizing Reward Hacking
v1v2 (latest)

Defining and Characterizing Reward Hacking

27 September 2022
Joar Skalse
Nikolaus H. R. Howe
Dmitrii Krasheninnikov
David M. Krueger
ArXiv (abs)PDFHTML

Papers citing "Defining and Characterizing Reward Hacking"

10 / 60 papers shown
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
Víctor Gallego
SyDa
251
5
0
11 Aug 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from
  Human Feedback
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALMOffRL
358
712
0
27 Jul 2023
Learning to Generate Better Than Your LLM
Learning to Generate Better Than Your LLM
Jonathan D. Chang
Kianté Brantley
Rajkumar Ramamurthy
Dipendra Kumar Misra
Wen Sun
272
54
0
20 Jun 2023
Machine Love
Machine Love
Joel Lehman
290
5
0
18 Feb 2023
On The Fragility of Learned Reward Functions
On The Fragility of Learned Reward Functions
Lev McKinney
Yawen Duan
David M. Krueger
Adam Gleave
175
23
0
09 Jan 2023
Misspecification in Inverse Reinforcement Learning
Misspecification in Inverse Reinforcement LearningAAAI Conference on Artificial Intelligence (AAAI), 2022
Joar Skalse
Alessandro Abate
216
28
0
06 Dec 2022
Reward Gaming in Conditional Text Generation
Reward Gaming in Conditional Text GenerationAnnual Meeting of the Association for Computational Linguistics (ACL), 2022
Richard Yuanzhe Pang
Vishakh Padmakumar
Thibault Sellam
Ankur P. Parikh
He He
370
28
0
16 Nov 2022
Scaling Laws for Reward Model Overoptimization
Scaling Laws for Reward Model OveroptimizationInternational Conference on Machine Learning (ICML), 2022
Leo Gao
John Schulman
Jacob Hilton
ALM
376
776
0
19 Oct 2022
The Alignment Problem from a Deep Learning Perspective
The Alignment Problem from a Deep Learning PerspectiveInternational Conference on Learning Representations (ICLR), 2022
Richard Ngo
Lawrence Chan
Sören Mindermann
534
247
0
30 Aug 2022
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and
  Benchmarking
Provably Safe Reinforcement Learning: Conceptual Analysis, Survey, and Benchmarking
Hanna Krasowski
Jakob Thumm
Marlon Müller
Lukas Schäfer
Xiao Wang
Matthias Althoff
314
39
0
13 May 2022
Previous
12