74
1

On the ERM Principle in Meta-Learning

Abstract

Classic supervised learning involves algorithms trained on nn labeled examples to produce a hypothesis hHh \in \mathcal{H} aimed at performing well on unseen examples. Meta-learning extends this by training across nn tasks, with mm examples per task, producing a hypothesis class H\mathcal{H} within some meta-class H\mathbb{H}. This setting applies to many modern problems such as in-context learning, hypernetworks, and learning-to-learn. A common method for evaluating the performance of supervised learning algorithms is through their learning curve, which depicts the expected error as a function of the number of training examples. In meta-learning, the learning curve becomes a two-dimensional learning surface, which evaluates the expected error on unseen domains for varying values of nn (number of tasks) and mm (number of training examples). Our findings characterize the distribution-free learning surfaces of meta-Empirical Risk Minimizers when either mm or nn tend to infinity: we show that the number of tasks must increase inversely with the desired error. In contrast, we show that the number of examples exhibits very different behavior: it satisfies a dichotomy where every meta-class conforms to one of the following conditions: (i) either mm must grow inversely with the error, or (ii) a \emph{finite} number of examples per task suffices for the error to vanish as nn goes to infinity. This finding illustrates and characterizes cases in which a small number of examples per task is sufficient for successful learning. We further refine this for positive values of ε\varepsilon and identify for each ε\varepsilon how many examples per task are needed to achieve an error of ε\varepsilon in the limit as the number of tasks nn goes to infinity. We achieve this by developing a necessary and sufficient condition for meta-learnability using a bounded number of examples per domain.

View on arXiv
Comments on this paper