On the ERM Principle in Meta-Learning

Classic supervised learning involves algorithms trained on labeled examples to produce a hypothesis aimed at performing well on unseen examples. Meta-learning extends this by training across tasks, with examples per task, producing a hypothesis class within some meta-class . This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of (number of tasks) and (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either or tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of and identify for each how many examples per task are needed to achieve an error of in the limit as the number of tasks goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.
View on arXiv