ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.11039
30
0

Balancing Immediate Revenue and Future Off-Policy Evaluation in Coupon Allocation

6 July 2024
Naoki Nishimura
Ken Kobayashi
Kazuhide Nakata
    OffRL
ArXivPDFHTML
Abstract

Coupon allocation drives customer purchases and boosts revenue. However, it presents a fundamental trade-off between exploiting the current optimal policy to maximize immediate revenue and exploring alternative policies to collect data for future policy improvement via off-policy evaluation (OPE). While online A/B testing can validate new policies, it risks compromising short-term revenue. Conversely, relying solely on an exploitative policy hinders the ability to reliably estimate and enhance future policies. To balance this trade-off, we propose a novel approach that combines a model-based revenue maximization policy and a randomized exploration policy for data collection. Our framework enables flexibly adjusting the mixture ratio between these two policies to optimize the balance between short-term revenue and future policy improvement. We formulate the problem of determining the optimal mixture ratio between a model-based revenue maximization policy and a randomized exploration policy for data collection. We empirically verified the effectiveness of the proposed mixed policy using both synthetic and real-world data. Our main contributions are: (1) Demonstrating a mixed policy combining deterministic and probabilistic policies, flexibly adjusting the data collection vs. revenue trade-off. (2) Formulating the optimal mixture ratio problem as multi-objective optimization, enabling quantitative evaluation of this trade-off. By optimizing the mixture ratio, businesses can maximize revenue while ensuring reliable future OPE and policy improvement. This framework is applicable in any context where the exploration-exploitation trade-off is relevant.

View on arXiv
Comments on this paper