109
v1v2 (latest)

Guaranteed satisficing and finite regret: Analysis of a cognitive satisficing value function

Abstract

As reinforcement learning algorithms are being applied to increasingly complicated and realistic tasks, it is becoming increasingly difficult to solve such problems within a practical time frame. Hence, we focus on a \textit{satisficing} strategy that looks for an action whose value is above the aspiration level (analogous to the break-even point), rather than the optimal action. In this paper, we introduce a simple mathematical model called risk-sensitive satisficing (RSRS) that implements a satisficing strategy by integrating risk-averse and risk-prone attitudes under the greedy policy. We apply the proposed model to the KK-armed bandit problems, which constitute the most basic class of reinforcement learning tasks, and prove two propositions. The first is that RSRS is guaranteed to find an action whose value is above the aspiration level. The second is that the regret (expected loss) of RSRS is upper bounded by a finite value, given that the aspiration level is set to an "optimal level" so that satisficing implies optimizing. We confirm the results through numerical simulations and compare the performance of RSRS with that of other representative algorithms for the KK-armed bandit problems.

View on arXiv
Comments on this paper