268

Online Recommendations for Agents with Discounted Adaptive Preferences

International Conference on Algorithmic Learning Theory (ALT), 2023
Abstract

For domains in which a recommender provides repeated content suggestions, agent preferences may evolve over time as a function of prior recommendations, and algorithms must take this into account for long-run optimization. Recently, Agarwal and Brown (2022) introduced a model for studying recommendations when agents' preferences are adaptive, and gave a series of results for the case when agent preferences depend {\it uniformly} on their history of past selections. Here, the recommender shows a kk-item menu (out of nn) to the agent at each round, who selects one of the kk items via their history-dependent {\it preference model}, yielding a per-item adversarial reward for the recommender. We expand this setting to {\it non-uniform} preferences, and give a series of results for {\it γ\gamma-discounted} histories. For this problem, the feasible regret benchmarks can depend drastically on varying conditions. In the ``large γ\gamma'' regime, we show that the previously considered benchmark, the ``EIRD set'', is attainable for any {\it smooth} model, relaxing the ``local learnability'' requirement from the uniform memory case. We introduce ``pseudo-increasing'' preference models, for which we give an algorithm which can compete against any item distribution with small uniform noise (the ``smoothed simplex''). We show NP-hardness results for larger regret benchmarks in each case. We give another algorithm for pseudo-increasing models (under a restriction on the adversarial nature of the reward functions), which works for any γ\gamma and is faster when γ\gamma is sufficiently small, and we show a super-polynomial regret lower bound with respect to EIRD for general models in the ``small γ\gamma'' regime. We conclude with a pair of algorithms for the memoryless case.

View on arXiv
Comments on this paper