Online Low Rank Matrix Completion
We study the problem of \textit{online} low-rank matrix completion with users, items and rounds. In each round, we recommend one item per user. For each recommendation, we obtain a (noisy) reward sampled from a low-rank user-item reward matrix. The goal is to design an online method with sub-linear regret (in ). While the problem can be mapped to the standard multi-armed bandit problem where each item is an \textit{independent} arm, it leads to poor regret as the correlation between arms and users is not exploited. In contrast, exploiting the low-rank structure of reward matrix is challenging due to non-convexity of low-rank manifold. We overcome this challenge using an explore-then-commit (ETC) approach that ensures a regret of . That is, roughly only item recommendations are required per user to get non-trivial solution. We further improve our result for the rank- setting. Here, we propose a novel algorithm OCTAL (Online Collaborative filTering using iterAtive user cLustering) that ensures nearly optimal regret bound of . Our algorithm uses a novel technique of clustering users and eliminating items jointly and iteratively, which allows us to obtain nearly minimax optimal rate in .
View on arXiv