Online Optimization Algorithms in Repeated Price Competition: Equilibrium Learning and Algorithmic Collusion
This paper examines whether widely used online learning algorithms in pricing can independently reach competitive outcomes or instead foster tacit collusion. This issue has drawn considerable attention from competition regulators as algorithmic pricing becomes more common in digital markets. Understanding when such algorithms lead to equilibrium prices or to supra-competitive prices is critical for buyers, sellers, and policymakers.We study the behavior of multi-armed bandit algorithms in repeated price competition. These algorithms only observe profits from the chosen prices, making them realistic models of automated pricing. Our formal analysis shows that an important class of online learning algorithms, called mean-based algorithms, reliably converges to Nash equilibrium in Bertrand competition. This finding is notable because, generally, online learning algorithms do not guarantee convergence. We also run extensive numerical experiments with different bandit algorithms, confirming that most widely used algorithms, including those not mean-based, converge to equilibrium. We observe supra-competitive prices only in specific cases where all sellers implement the same symmetric version of certain algorithms, such as UCB or Q-learning, and this effect diminishes as the number of competitors increases.Our results highlight that the risk of algorithmic collusion in competitive markets is often overstated. For most practical implementations of bandit algorithms, sellers' prices converge to competitive levels. Only under very specific and symmetric setups do prices remain above competitive benchmarks, and this effect diminishes with more competitors. These insights support regulators concerned with consumer welfare and managers considering algorithmic pricing tools. They suggest that while vigilance is warranted, fears of widespread algorithm-driven collusion may be exaggerated.
View on arXiv