A Simple and Provable Scaling Law for the Test-Time Compute of Large
Language Models
- LRM
We propose a general two-stage algorithm that enjoys a provable scaling law for the test-time compute of large language models (LLMs). Given an input problem, the proposed algorithm first generates candidate solutions, and then chooses the best one via a multiple-round knockout tournament where each pair of candidates are compared for times and only the winners move on to the next round. In a minimalistic implementation, both stages can be executed with a black-box LLM alone and nothing else (e.g., no external verifier or reward model), and a total of highly parallelizable LLM calls are needed for solving an input problem. Assuming that a generated candidate solution is correct with probability and a comparison between a pair of correct and incorrect solutions identifies the right winner with probability (i.e., better than a random guess), we prove theoretically that the failure probability of the proposed algorithm decays to zero exponentially with respect to and : Our empirical results with the challenging MMLU-Pro benchmark validate the technical assumptions, as well as the efficacy of the proposed algorithm and the gains from scaling up its test-time compute.
View on arXiv