37
0

Soft Best-of-n Sampling for Model Alignment

Abstract

Best-of-nn (BoN) sampling is a practical approach for aligning language model outputs with human preferences without expensive fine-tuning. BoN sampling is performed by generating nn responses to a prompt and then selecting the sample that maximizes a reward function. BoN yields high reward values in practice at a distortion cost, as measured by the KL-divergence between the sampled and original distribution. This distortion is coarsely controlled by varying the number of samples: larger nn yields a higher reward at a higher distortion cost. We introduce Soft Best-of-nn sampling, a generalization of BoN that allows for smooth interpolation between the original distribution and reward-maximizing distribution through a temperature parameter λ\lambda. We establish theoretical guarantees showing that Soft Best-of-nn sampling converges sharply to the optimal tilted distribution at a rate of O(1/n)O(1/n) in KL and the expected (relative) reward. For sequences of discrete outputs, we analyze an additive reward model that reveals the fundamental limitations of blockwise sampling.

View on arXiv
@article{verdun2025_2505.03156,
  title={ Soft Best-of-n Sampling for Model Alignment },
  author={ Claudio Mayrink Verdun and Alex Oesterling and Himabindu Lakkaraju and Flavio P. Calmon },
  journal={arXiv preprint arXiv:2505.03156},
  year={ 2025 }
}
Comments on this paper