110
12

ROI Constrained Bidding via Curriculum-Guided Bayesian Reinforcement Learning

Abstract

Real-Time Bidding (RTB) is an important mechanism in modern online advertising systems. Advertisers employ bidding strategies in RTB to optimize their advertising effects subject to various financial requirements, among which a widely adopted one is the return-on-investment (ROI) constraint. ROIs change non-monotonically during the sequential bidding process, usually presenting a see-saw effect between constraint satisfaction and objective optimization. Existing solutions to the constraint-objective trade-off are typically established in static or mildly changing markets. However, these methods fail significantly in non-stationary advertising markets due to their inability to adapt to varying dynamics and partial observability. In this work, we specialize in ROI-Constrained Bidding in non-stationary markets. Based on a Partially Observable Constrained Markov Decision Process, we propose the first hard barrier solution to accommodate non-monotonic constraints. Our method exploits a parameter-free indicator-augmented reward function and develops a Curriculum-Guided Bayesian Reinforcement Learning (CBRL) framework to adaptively control the constraint-objective trade-off in non-stationary advertising markets. Extensive experiments on a large-scale industrial dataset with two problem settings reveal that CBRL generalizes well in both in-distribution and out-of-distribution data regimes, and enjoys outstanding stability.

View on arXiv
Comments on this paper