ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.05950
28
0

SQT -- std QQQ-target

3 February 2024
Nitsan Soffair
Dotan Di Castro
Orly Avner
Shie Mannor
    OffRL
ArXivPDFHTML
Abstract

Std QQQ-target is a conservative, actor-critic, ensemble, QQQ-learning-based algorithm, which is based on a single key QQQ-formula: QQQ-networks standard deviation, which is an "uncertainty penalty", and, serves as a minimalistic solution to the problem of overestimation bias. We implement SQT on top of TD3/TD7 code and test it against the state-of-the-art (SOTA) actor-critic algorithms, DDPG, TD3 and TD7 on seven popular MuJoCo and Bullet tasks. Our results demonstrate SQT's QQQ-target formula superiority over TD3's QQQ-target formula as a conservative solution to overestimation bias in RL, while SQT shows a clear performance advantage on a wide margin over DDPG, TD3, and TD7 on all tasks.

View on arXiv
Comments on this paper