389
v1v2 (latest)

Guided Speculative Inference for Efficient Test-Time Alignment of LLMs

Main:10 Pages
11 Figures
Bibliography:5 Pages
4 Tables
Appendix:24 Pages
Abstract

We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of-nn test-time scaling with a reward model r(x,y)r(x,y) and speculative samples from a small auxiliary model πS(yx)\pi_S(y\mid x). We provably approximate both the optimal tilted policy πβ,B(yx)πB(yx)exp(βr(x,y))\pi_{\beta,B}(y\mid x) \propto \pi_B(y\mid x)\exp(\beta\,r(x,y)) of soft best-of-nn under the base model πB\pi_B, as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K), our method achieves higher accuracy than standard soft best-of-nn with πS\pi_S and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of-nn with πB\pi_B. The code is available atthis https URL.

View on arXiv
Comments on this paper