246

Online Low Rank Matrix Completion

International Conference on Learning Representations (ICLR), 2022
Abstract

We study the problem of \textit{online} low-rank matrix completion with M\mathsf{M} users, N\mathsf{N} items and T\mathsf{T} rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in T\mathsf{T}). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of O(polylog(M+N)T2/3)O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{2/3}). That is, roughly only polylog(M+N)\mathsf{polylog} (\mathsf{M}+\mathsf{N}) item recommendations are required per user to get non-trivial solution. We further improve our result for the rank-11 setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of O(polylog(M+N)T1/2)O(\mathsf{polylog} (\mathsf{M}+\mathsf{N}) \mathsf{T}^{1/2}). Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in T\mathsf{T}.

View on arXiv
Comments on this paper