v1v2 (latest)

Maxmin Q-learning: Controlling the Estimation Bias of Q-learning

International Conference on Learning Representations (ICLR), 2020

16 February 2020

Papers citing "Maxmin Q-learning: Controlling the Estimation Bias of Q-learning"

50 / 110 papers shown

Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization

189

20 Nov 2025

FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning

167

26 Oct 2025

Learning to Undo: Rollback-Augmented Reinforcement Learning with Reversibility Signals

193

16 Oct 2025

Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning

195

02 Oct 2025

Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method

137

01 Oct 2025

A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory

Fengdi Che

OffRL

186

11 Aug 2025

Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies

242

05 Aug 2025

Is Exploration or Optimization the Problem for Deep Reinforcement Learning?

Glen Berseth

OffRL

229

02 Aug 2025

Directional Ensemble Aggregation for Actor-Critics

315

31 Jul 2025

Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning

382

06 Jun 2025

Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning

234

06 Jun 2025

Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss

Ukjo Hwang

Songnam Hong

OffRL

284

14 Apr 2025

A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network OptimizationIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024

Talha Bozkus

Urbashi Mitra

208

31 Dec 2024

SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree SearchNorth American Chapter of the Association for Computational Linguistics (NAACL), 2024

Hanwen Du

Bo Peng

Xia Ning

491

12 Oct 2024

Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement LearningNeural Information Processing Systems (NeurIPS), 2024

Xinran Li

Ling Pan

Jun Zhang

294

11 Oct 2024

MAD-TD: Model-Augmented Data stabilizes High Update Ratio RLInternational Conference on Learning Representations (ICLR), 2024

C. Voelcker

Marcel Hussing

Eric Eaton

Amir-massoud Farahmand

Igor Gilitschenski

490

11 Oct 2024

Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement LearningIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024

Shreyas S R

OffRL OnRL

283

10 Sep 2024

Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy ChurnNeural Information Processing Systems (NeurIPS), 2024

Hongyao Tang

Glen Berseth

OffRL

369

07 Sep 2024

Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors

455

28 Jun 2024

Mixture of Experts in a Mixture of RL settings

Jakob Foerster

Pablo Samuel Castro

389

26 Jun 2024

Highway Reinforcement Learning

257

28 May 2024

Stochastic Q-learning for Large Discrete Action SpacesInternational Conference on Machine Learning (ICML), 2024

365

16 May 2024

vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy ImprovementAdaptive Agents and Multi-Agent Systems (AAMAS), 2024

Jianye Hao

Changjie Fan

271

14 May 2024

Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning

Zhen Wang

311

12 May 2024

The Curse of Diversity in Ensemble-Based Exploration

354

07 May 2024

CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple CriticsAAAI Conference on Artificial Intelligence (AAAI), 2024

Minas Liarokapis

496

04 May 2024

Regularized Q-learning through Robust AveragingInternational Conference on Machine Learning (ICML), 2024

Peter Schmitt-Förster

Tobias Sutter

OOD

271

03 May 2024

Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation

329

19 Apr 2024

Simple Ingredients for Offline Reinforcement Learning

388

19 Mar 2024

Dissecting Deep RL with High Update Ratios: Combatting Value Divergence

Marcel Hussing

C. Voelcker

Igor Gilitschenski

Amir-massoud Farahmand

Eric Eaton

450

09 Mar 2024

Conservative DDPG -- Pessimistic RL without Ensemble

Nitsan Soffair

Shie Mannor

OffRL

231

08 Mar 2024

Self-evolving Autoencoder Embedded Q-Network

Ieee J. Senthilnath Senior Member

265

18 Feb 2024

Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks

Talha Bozkus

Urbashi Mitra

307

12 Feb 2024

Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization

Talha Bozkus

Urbashi Mitra

OffRL

322

08 Feb 2024

SQT -- std

Q

312

03 Feb 2024

SLIM: Skill Learning with Multiple Critics

330

01 Feb 2024

REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision ProcessesInternational Conference on Learning Representations (ICLR), 2024

David Ireland

Giovanni Montana

393

16 Jan 2024

SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning

252

06 Jan 2024

Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control

Bernd Frauenknecht

Tobias Ehlgen

Sebastian Trimpe

324

30 Nov 2023

Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design

Yannick Vogt

Mehdi Naouar

M. Kalweit

Christoph Cornelius Miething

252

29 Nov 2023

Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning

Junmin Zhong

Ruofan Wu

Jennie Si

OffRL

148

07 Nov 2023

Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous ControlNeural Information Processing Systems (NeurIPS), 2023

319

17 Oct 2023

Suppressing Overestimation in Q-Learning through Adversarial Behaviors

HyeAnn Lee

Donghwan Lee

259

10 Oct 2023

Elephant Neural Networks: Born to Be a Continual Learner

Qingfeng Lan

A. Rupam Mahmood

CLL

469

02 Oct 2023

Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and SmoothnessJournal of Artificial Intelligence Research (JAIR), 2023

Zhen Wang

238

29 Sep 2023

Adapting Double Q-Learning for Continuous Reinforcement Learning

Arsenii Kuznetsov

OffRL OnRL

175

25 Sep 2023

IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy ReuseAutonomous Agents and Multi-Agent Systems (AAMAS), 2023

Siyuan Li

Zhen Wang

231

14 Aug 2023

Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning

253

29 Jun 2023

Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error FeedbackNeural Information Processing Systems (NeurIPS), 2023

Hang Wang

Sen Lin

Junshan Zhang

209

20 Jun 2023

Improving Offline-to-Online Reinforcement Learning with Q-EnsemblesAdaptive Agents and Multi-Agent Systems (AAMAS), 2023

Jianye Hao

Yan Zheng

493

12 Jun 2023