Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.06487
Cited By
v1
v2 (latest)
Maxmin Q-learning: Controlling the Estimation Bias of Q-learning
International Conference on Learning Representations (ICLR), 2020
16 February 2020
Qingfeng Lan
Yangchen Pan
Alona Fyshe
Martha White
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Maxmin Q-learning: Controlling the Estimation Bias of Q-learning"
50 / 110 papers shown
Mitigating Estimation Bias with Representation Learning in TD Error-Driven Regularization
Haohui Chen
Zhiyong Chen
Aoxiang Liu
Wentuo Fang
189
0
0
20 Nov 2025
FlowCritic: Bridging Value Estimation with Flow Matching in Reinforcement Learning
Shan Zhong
Shutong Ding
He Diao
Xiangyu Wang
Kah Chan Teh
Bei Peng
OffRL
167
3
0
26 Oct 2025
Learning to Undo: Rollback-Augmented Reinforcement Learning with Reversibility Signals
Andrejs Sorstkins
Omer Tariq
Muhammad Bilal
OffRL
193
0
0
16 Oct 2025
Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning
Ahmed Hendawy
Henrik Metternich
Théo Vincent
Mahdi Kallel
Jan Peters
Carlo DÉramo
OffRL
195
3
0
02 Oct 2025
Multi-Actor Multi-Critic Deep Deterministic Reinforcement Learning with a Novel Q-Ensemble Method
Andy Wu
Chun-Cheng Lin
Rung-Tzuo Liaw
Yuehua Huang
Chihjung Kuo
Chia Tong Weng
137
0
0
01 Oct 2025
A Tutorial: An Intuitive Explanation of Offline Reinforcement Learning Theory
Fengdi Che
OffRL
186
0
0
11 Aug 2025
Scaling DRL for Decision Making: A Survey on Data, Network, and Training Budget Strategies
Yi Ma
Hongyao Tang
Chenjun Xiao
Yaodong Yang
Wei Wei
Jianye Hao
Jiye Liang
OffRL
242
0
0
05 Aug 2025
Is Exploration or Optimization the Problem for Deep Reinforcement Learning?
Glen Berseth
OffRL
229
1
0
02 Aug 2025
Directional Ensemble Aggregation for Actor-Critics
Nicklas Werge
Yi-Shan Wu
Bahareh Tasdighi
M. Kandemir
OffRL
315
0
0
31 Jul 2025
Gradual Transition from Bellman Optimality Operator to Bellman Operator in Online Reinforcement Learning
Motoki Omura
Kazuki Ota
Takayuki Osa
Yusuke Mukuta
Tatsuya Harada
OffRL
382
0
0
06 Jun 2025
Ensemble Elastic DQN: A novel multi-step ensemble approach to address overestimation in deep value-based reinforcement learning
Adrian Ly
Richard Dazeley
Peter Vamplew
F. Cruz
Sunil Aryal
234
1
0
06 Jun 2025
Moderate Actor-Critic Methods: Controlling Overestimation Bias via Expectile Loss
Ukjo Hwang
Songnam Hong
OffRL
284
0
0
14 Apr 2025
A Multi-Agent Multi-Environment Mixed Q-Learning for Partially Decentralized Wireless Network Optimization
IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2024
Talha Bozkus
Urbashi Mitra
208
3
0
31 Dec 2024
SAPIENT: Mastering Multi-turn Conversational Recommendation with Strategic Planning and Monte Carlo Tree Search
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Hanwen Du
Bo Peng
Xia Ning
491
2
0
12 Oct 2024
Kaleidoscope: Learnable Masks for Heterogeneous Multi-agent Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2024
Xinran Li
Ling Pan
Jun Zhang
294
6
0
11 Oct 2024
MAD-TD: Model-Augmented Data stabilizes High Update Ratio RL
International Conference on Learning Representations (ICLR), 2024
C. Voelcker
Marcel Hussing
Eric Eaton
Amir-massoud Farahmand
Igor Gilitschenski
490
12
0
11 Oct 2024
Double Successive Over-Relaxation Q-Learning with an Extension to Deep Reinforcement Learning
IEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2024
Shreyas S R
OffRL
OnRL
283
1
0
10 Sep 2024
Improving Deep Reinforcement Learning by Reducing the Chain Effect of Value and Policy Churn
Neural Information Processing Systems (NeurIPS), 2024
Hongyao Tang
Glen Berseth
OffRL
369
12
0
07 Sep 2024
Contextualized Hybrid Ensemble Q-learning: Learning Fast with Control Priors
Emma Cramer
Bernd Frauenknecht
Ramil Sabirov
Sebastian Trimpe
OffRL
OnRL
455
8
0
28 Jun 2024
Mixture of Experts in a Mixture of RL settings
Timon Willi
J. Obando-Ceron
Jakob Foerster
Karolina Dziugaite
Pablo Samuel Castro
MoE
389
17
0
26 Jun 2024
Highway Reinforcement Learning
Yuhui Wang
M. Strupl
Francesco Faccio
Qingyuan Wu
Haozhe Liu
Michal Grudzieñ
Xiaoyang Tan
Jürgen Schmidhuber
OffRL
257
4
0
28 May 2024
Stochastic Q-learning for Large Discrete Action Spaces
International Conference on Machine Learning (ICML), 2024
Fares Fourati
Vaneet Aggarwal
Mohamed-Slim Alouini
OffRL
365
9
0
16 May 2024
vMFER: Von Mises-Fisher Experience Resampling Based on Uncertainty of Gradient Directions for Policy Improvement
Adaptive Agents and Multi-Agent Systems (AAMAS), 2024
Yiwen Zhu
Jinyi Liu
Wenya Wei
Qianyi Fu
Yujing Hu
Zhou Fang
Bo An
Jianye Hao
Tangjie Lv
Changjie Fan
271
5
0
14 May 2024
Ensemble Successor Representations for Task Generalization in Offline-to-Online Reinforcement Learning
Changhong Wang
Xudong Yu
Chenjia Bai
Qiaosheng Zhang
Zhen Wang
311
2
0
12 May 2024
The Curse of Diversity in Ensemble-Based Exploration
Zhixuan Lin
P. DÓro
Evgenii Nikishin
Rameswar Panda
354
7
0
07 May 2024
CTD4 -- A Deep Continuous Distributional Actor-Critic Agent with a Kalman Fusion of Multiple Critics
AAAI Conference on Artificial Intelligence (AAAI), 2024
David Valencia
Henry Williams
Trevor Gee
Bruce A MacDonaland
Minas V. Liarokapis
Minas Liarokapis
OffRL
496
6
0
04 May 2024
Regularized Q-learning through Robust Averaging
International Conference on Machine Learning (ICML), 2024
Peter Schmitt-Förster
Tobias Sutter
OOD
271
0
0
03 May 2024
Adaptive Regularization of Representation Rank as an Implicit Constraint of Bellman Equation
Qiang He
Wanrong Zhu
Meng Fang
S. Maghsudi
329
8
0
19 Apr 2024
Simple Ingredients for Offline Reinforcement Learning
Edoardo Cetin
Andrea Tirinzoni
Matteo Pirotta
A. Lazaric
Yann Ollivier
Ahmed Touati
OffRL
388
3
0
19 Mar 2024
Dissecting Deep RL with High Update Ratios: Combatting Value Divergence
Marcel Hussing
C. Voelcker
Igor Gilitschenski
Amir-massoud Farahmand
Eric Eaton
450
13
0
09 Mar 2024
Conservative DDPG -- Pessimistic RL without Ensemble
Nitsan Soffair
Shie Mannor
OffRL
231
0
0
08 Mar 2024
Self-evolving Autoencoder Embedded Q-Network
Ieee J. Senthilnath Senior Member
Zhen Bangjian Zhou
Wei Ng
Deeksha Aggarwal
Rajdeep Dutta
Ji Wei Yoon
Phyu Aung
Keyu Wu
Ieee Li Fellow
Xiaoli Li
265
2
0
18 Feb 2024
Leveraging Digital Cousins for Ensemble Q-Learning in Large-Scale Wireless Networks
Talha Bozkus
Urbashi Mitra
307
9
0
12 Feb 2024
Multi-Timescale Ensemble Q-learning for Markov Decision Process Policy Optimization
Talha Bozkus
Urbashi Mitra
OffRL
322
8
0
08 Feb 2024
SQT -- std
Q
Q
Q
-target
Nitsan Soffair
Dotan Di Castro
Orly Avner
Shie Mannor
OffRL
312
0
0
03 Feb 2024
SLIM: Skill Learning with Multiple Critics
David Emukpere
Bingbing Wu
Julien Perez
J. Renders
330
3
0
01 Feb 2024
REValueD: Regularised Ensemble Value-Decomposition for Factorisable Markov Decision Processes
International Conference on Learning Representations (ICLR), 2024
David Ireland
Giovanni Montana
393
6
0
16 Jan 2024
SPQR: Controlling Q-ensemble Independence with Spiked Random Model for Reinforcement Learning
Dohyeok Lee
Seung Han
Taehyun Cho
Jungwoo Lee
OffRL
252
8
0
06 Jan 2024
Data-efficient Deep Reinforcement Learning for Vehicle Trajectory Control
Bernd Frauenknecht
Tobias Ehlgen
Sebastian Trimpe
324
5
0
30 Nov 2023
Stable Online and Offline Reinforcement Learning for Antibody CDRH3 Design
Yannick Vogt
Mehdi Naouar
M. Kalweit
Christoph Cornelius Miething
Justus Duyster
Roland Mertelsmann
Gabriel Kalweit
Joschka Boedecker
OffRL
OnRL
252
3
0
29 Nov 2023
Mitigating Estimation Errors by Twin TD-Regularized Actor and Critic for Deep Reinforcement Learning
Junmin Zhong
Ruofan Wu
Jennie Si
OffRL
148
2
0
07 Nov 2023
Keep Various Trajectories: Promoting Exploration of Ensemble Policies in Continuous Control
Neural Information Processing Systems (NeurIPS), 2023
Chao Li
Chen Gong
Qiang He
Xinwen Hou
319
6
0
17 Oct 2023
Suppressing Overestimation in Q-Learning through Adversarial Behaviors
HyeAnn Lee
Donghwan Lee
259
2
0
10 Oct 2023
Elephant Neural Networks: Born to Be a Continual Learner
Qingfeng Lan
A. Rupam Mahmood
CLL
469
11
0
02 Oct 2023
Towards Robust Offline-to-Online Reinforcement Learning via Uncertainty and Smoothness
Journal of Artificial Intelligence Research (JAIR), 2023
Xiaoyu Wen
Xudong Yu
Rui Yang
Chenjia Bai
Zhen Wang
OffRL
OnRL
238
15
0
29 Sep 2023
Adapting Double Q-Learning for Continuous Reinforcement Learning
Arsenii Kuznetsov
OffRL
OnRL
175
1
0
25 Sep 2023
IOB: Integrating Optimization Transfer and Behavior Transfer for Multi-Policy Reuse
Autonomous Agents and Multi-Agent Systems (AAMAS), 2023
Siyuan Li
Haoyang Li
Jin Zhang
Zhen Wang
Peng Liu
Chongjie Zhang
OffRL
231
3
0
14 Aug 2023
Eigensubspace of Temporal-Difference Dynamics and How It Improves Value Approximation in Reinforcement Learning
Qiang He
Wanrong Zhu
Meng Fang
S. Maghsudi
253
5
0
29 Jun 2023
Adaptive Ensemble Q-learning: Minimizing Estimation Bias via Error Feedback
Neural Information Processing Systems (NeurIPS), 2023
Hang Wang
Sen Lin
Junshan Zhang
209
27
0
20 Jun 2023
Improving Offline-to-Online Reinforcement Learning with Q-Ensembles
Adaptive Agents and Multi-Agent Systems (AAMAS), 2023
Kai-Wen Zhao
Yi-An Ma
Jianye Hao
Jinyi Liu
Yan Zheng
Zhaopeng Meng
OffRL
OnRL
493
20
0
12 Jun 2023
1
2
3
Next
Page 1 of 3