ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.07388
13
0

The Limits of Assumption-free Tests for Algorithm Performance

12 February 2024
Yuetian Luo
Rina Foygel Barber
ArXivPDFHTML
Abstract

Algorithm evaluation and comparison are fundamental questions in machine learning and statistics -- how well does an algorithm perform at a given modeling task, and which algorithm performs best? Many methods have been developed to assess algorithm performance, often based around cross-validation type strategies, retraining the algorithm of interest on different subsets of the data and assessing its performance on the held-out data points. Despite the broad use of such procedures, the theoretical properties of these methods are not yet fully understood. In this work, we explore some fundamental limits for answering these questions with limited amounts of data. In particular, we make a distinction between two questions: how good is an algorithm AAA at the problem of learning from a training set of size nnn, versus, how good is a particular fitted model produced by running AAA on a particular training data set of size nnn?Our main results prove that, for any test that treats the algorithm AAA as a ``black box'' (i.e., we can only study the behavior of AAA empirically), there is a fundamental limit on our ability to carry out inference on the performance of AAA, unless the number of available data points NNN is many times larger than the sample size nnn of interest. (On the other hand, evaluating the performance of a particular fitted model is easy as long as a holdout data set is available -- that is, as long as N−nN-nN−n is not too small.) We also ask whether an assumption of algorithmic stability might be sufficient to circumvent this hardness result. Surprisingly, we find that this is not the case: the same hardness result still holds for the problem of evaluating the performance of AAA, aside from a high-stability regime where fitted models are essentially nonrandom. Finally, we also establish similar hardness results for the problem of comparing multiple algorithms.

View on arXiv
@article{luo2025_2402.07388,
  title={ The Limits of Assumption-free Tests for Algorithm Performance },
  author={ Yuetian Luo and Rina Foygel Barber },
  journal={arXiv preprint arXiv:2402.07388},
  year={ 2025 }
}
Comments on this paper