65
9

Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models

Abstract

We propose two simple yet principled algorithms that enjoy provable scaling laws for the test-time compute of large language models (LLMs), which require a black-box LLM and nothing else (e.g., no external verifier or reward model) for a minimalistic implementation. (i) The first one is a two-stage knockout-style algorithm: given an input problem, it first generates multiple candidate solutions, and then aggregate them for a final output, via a knockout tournament where pairwise comparisons among the candidates are conducted. Assuming that the LLM can generate a correct solution with non-zero probability and do better than a random guess in comparing a pair of correct and incorrect solutions, we prove theoretically that the failure probability of this algorithm decays to zero exponentially or by a power law (depending on the specific way of scaling) as its test-time compute grows. (ii) The second one is a two-stage league-style algorithm, where each candidate solution is evaluated by its average win rate against multiple opponents, rather than eliminated upon loss to a single opponent. Under certain technical assumptions that are analogous to but more robust than those required by the knockout-style algorithm, we prove theoretically that the failure probability of the league-style algorithm also decays to zero exponentially as its test-time compute grows. Through extensive experiments with two challenging benchmarks, namely GPQA and MMLU-Pro, we validate the proposed theories and demonstrate the outstanding scaling properties of both algorithms.

View on arXiv
@article{chen2025_2411.19477,
  title={ Simple and Provable Scaling Laws for the Test-Time Compute of Large Language Models },
  author={ Yanxi Chen and Xuchen Pan and Yaliang Li and Bolin Ding and Jingren Zhou },
  journal={arXiv preprint arXiv:2411.19477},
  year={ 2025 }
}
Comments on this paper