ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2507.08761
313
0
v1v2 (latest)

Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data

11 July 2025
Jeonghye Kim
Yongjae Shin
Whiyoung Jung
Sunghoon Hong
Deunsol Yoon
Y. Sung
Kanghoon Lee
Woohyung Lim
    OffRL
ArXiv (abs)PDFHTML
Main:9 Pages
21 Figures
Bibliography:3 Pages
14 Tables
Appendix:10 Pages
Abstract

Reinforcement learning with offline data suffers from Q-value extrapolation errors. To address this issue, we first demonstrate that linear extrapolation of the Q-function beyond the data range is particularly problematic. To mitigate this, we propose guiding the gradual decrease of Q-values outside the data range, which is achieved through reward scaling with layer normalization (RS-LN) and a penalization mechanism for infeasible actions (PA). By combining RS-LN and PA, we develop a new algorithm called PARS. We evaluate PARS across a range of tasks, demonstrating superior performance compared to state-of-the-art algorithms in both offline training and online fine-tuning on the D4RL benchmark, with notable success in the challenging AntMaze Ultra task.

View on arXiv
Comments on this paper