v1v2 (latest)
Guided Speculative Inference for Efficient Test-Time Alignment of LLMs
Main:10 Pages
11 Figures
Bibliography:5 Pages
4 Tables
Appendix:24 Pages
Abstract
We propose Guided Speculative Inference (GSI), a novel algorithm for efficient reward-guided decoding in large language models. GSI combines soft best-of- test-time scaling with a reward model and speculative samples from a small auxiliary model . We provably approximate both the optimal tilted policy of soft best-of- under the base model , as well as the expected reward under the optimal policy. In experiments on reasoning benchmarks (MATH500, OlympiadBench, Minerva Math, MMLU-STEM, GSM8K), our method achieves higher accuracy than standard soft best-of- with and reward-guided speculative decoding (Liao et al., 2025), and in certain settings even outperforms soft best-of- with . The code is available atthis https URL.
View on arXivComments on this paper
