Learning without Interaction Requires Separation

One of the key resources in large-scale learning systems is the number of rounds of communication between the server and the clients holding the data points. We study this resource for systems with two types of constraints on the communication from each of the clients: local differential privacy and limited number of bits communicated. For both models the number of rounds of communications is captured by the number of rounds of interaction when solving the learning problem in the statistical query (SQ) model. For many learning problems known efficient algorithms require many rounds of interaction. Yet little is known on whether this is actually necessary. In the context of classification in the PAC learning model, Kasiviswanathan et al. (2008) constructed an artificial class of functions that is PAC learnable with respect to a fixed distribution but cannot be learned by an efficient non-interactive (or one-round) SQ algorithm. Here we show that a similar separation holds for any class with large margin complexity that is closed under negation, without assumptions on the distribution. That is, classes of functions that cannot be represented as large-margin linear separators. In particular this is true for linear separators and decision lists. To prove this separation we show that non-interactive SQ algorithms can only learn function classes of low margin complexity. Our lower bound also holds against a stronger class of algorithms that for which only queries that depend on labels are non-interactive (we refer to them as label-non-adaptive). We complement this lower bound with a new efficient and label-non-adaptive SQ learning algorithm whose complexity is polynomial in the margin complexity. We thus obtain a new characterization of margin complexity that might be of independent interest.
View on arXiv