397
v1v2 (latest)

ββ-DQN: Improving Deep Q-Learning By Evolving the Behavior

Adaptive Agents and Multi-Agent Systems (AAMAS), 2025
Main:8 Pages
20 Figures
Bibliography:2 Pages
3 Tables
Appendix:12 Pages
Abstract

While many sophisticated exploration methods have been proposed, their lack of generality and high computational cost often lead researchers to favor simpler methods like ϵ\epsilon-greedy. Motivated by this, we introduce β\beta-DQN, a simple and efficient exploration method that augments the standard DQN with a behavior function β\beta. This function estimates the probability that each action has been taken at each state. By leveraging β\beta, we generate a population of diverse policies that balance exploration between state-action coverage and overestimation bias correction. An adaptive meta-controller is designed to select an effective policy for each episode, enabling flexible and explainable exploration. β\beta-DQN is straightforward to implement and adds minimal computational overhead to the standard DQN. Experiments on both simple and challenging exploration domains show that β\beta-DQN outperforms existing baseline methods across a wide range of tasks, providing an effective solution for improving exploration in deep reinforcement learning.

View on arXiv
Comments on this paper