Bayesian optimization (BO) has become an established framework and popular tool for hyperparameter optimization (HPO) of machine learning (ML) algorithms. While known for its sample-efficiency, vanilla BO can not utilize readily available prior beliefs the practitioner has on the potential location of the optimum. Thus, BO disregards a valuable source of information, reducing its appeal to ML practitioners. To address this issue, we propose BO, an acquisition function generalization which incorporates prior beliefs about the location of the optimum in the form of a probability distribution, provided by the user. In contrast to previous approaches, BO is conceptually simple and can easily be integrated with existing libraries and many acquisition functions. We provide regret bounds when BO is applied to the common Expected Improvement acquisition function and prove convergence at regular rates independently of the prior. Further, our experiments show that BO outperforms competing approaches across a wide suite of benchmarks and prior characteristics. We also demonstrate that BO improves on the state-of-the-art performance for a popular deep learning task, with a 12.5 time-to-accuracy speedup over prominent BO approaches.
View on arXiv