Data Selection for ERMs
Annual Conference Computational Learning Theory (COLT), 2025
Main:29 Pages
2 Figures
Bibliography:3 Pages
Abstract
Learning theory has traditionally followed a model-centric approach, focusing on designing optimal algorithms for a fixed natural learning task (e.g., linear classification or regression). In this paper, we adopt a complementary data-centric perspective, whereby we fix a natural learning rule and focus on optimizing the training data. Specifically, we study the following question: given a learning rule and a data selection budget , how well can perform when trained on at most data points selected from a population of points? We investigate when it is possible to select points and achieve performance comparable to training on the entire population.
View on arXivComments on this paper
