22
8

The sample complexity of multi-distribution learning

Abstract

Multi-distribution learning generalizes the classic PAC learning to handle data coming from multiple distributions. Given a set of kk data distributions and a hypothesis class of VC dimension dd, the goal is to learn a hypothesis that minimizes the maximum population loss over kk distributions, up to ϵ\epsilon additive error. In this paper, we settle the sample complexity of multi-distribution learning by giving an algorithm of sample complexity O~((d+k)ϵ2)(k/ϵ)o(1)\widetilde{O}((d+k)\epsilon^{-2}) \cdot (k/\epsilon)^{o(1)}. This matches the lower bound up to sub-polynomial factor and resolves the COLT 2023 open problem of Awasthi, Haghtalab and Zhao [AHZ23].

View on arXiv
Comments on this paper