Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2404.19409
Cited By
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
30 April 2024
Mathieu Rita
Florian Strub
Rahma Chaabouni
Paul Michel
Emmanuel Dupoux
Olivier Pietquin
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning"
9 / 9 papers shown
Title
EAPO: Enhancing Policy Optimization with On-Demand Expert Assistance
Siyao Song
Cong Ma
Zhihao Cheng
Shiye Lei
Minghao Li
Ying Zeng
Huaixiao Tou
Kai Jia
OffRL
LRM
103
0
0
28 Sep 2025
LLM-Driven Self-Refinement for Embodied Drone Task Planning
Deyu Zhang
Xicheng Zhang
Jiahao Li
Tingting Long
Xunhua Dai
Yongjian Fu
Jinrui Zhang
Ju Ren
Yaoxue Zhang
80
0
0
21 Aug 2025
MEMETRON: Metaheuristic Mechanisms for Test-time Response Optimization of Large Language Models
S. Nguyen
Theja Tulabandhula
151
0
0
10 Jun 2025
Proximalized Preference Optimization for Diverse Feedback Types: A Decomposed Perspective on DPO
Kaiyang Guo
Yinchuan Li
Zhitang Chen
310
0
0
29 May 2025
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Joey Tianyi Zhou
336
5
0
02 Oct 2024
Post-hoc Reward Calibration: A Case Study on Length Bias
International Conference on Learning Representations (ICLR), 2024
Zeyu Huang
Zihan Qiu
Zili Wang
Edoardo M. Ponti
Ivan Titov
254
11
0
25 Sep 2024
Model Surgery: Modulating LLM's Behavior Via Simple Parameter Editing
Huanqian Wang
Yang Yue
Rui Lu
Jingxin Shi
Andrew Zhao
Shenzhi Wang
Shiji Song
Gao Huang
LM&Ro
KELM
356
15
0
11 Jul 2024
Robust Preference Optimization through Reward Model Distillation
Adam Fisch
Jacob Eisenstein
Vicky Zayats
Alekh Agarwal
Ahmad Beirami
Chirag Nagpal
Peter Shaw
Jonathan Berant
385
57
0
29 May 2024
Provably Mitigating Overoptimization in RLHF: Your SFT Loss is Implicitly an Adversarial Regularizer
Zhihan Liu
Miao Lu
Shenao Zhang
Boyi Liu
Hongyi Guo
Yingxiang Yang
Jose H. Blanchet
Zhaoran Wang
311
82
0
26 May 2024
1