PAC-Bayesian Aggregation without Cross-Validation

We propose a new PAC-Bayesian procedure for aggregating prediction models and a new way of constructing a hypothesis space with which the procedure works particularly well. The procedure is based on alternating minimization of a new PAC-Bayesian bound, which is convex in the posterior distribution used for aggregation and also convex in a trade-off parameter between empirical performance of the distribution and its complexity, measured by the Kullback-Leibler divergence to a prior. The hypothesis space is constructed by training a finite number of weak classifiers, where each classifier is trained on a small subsample of the data and validated on the corresponding complementary subset of the data. The weak classifiers are then weighted with respect to their validation performance through minimization of the PAC-Bayesian bound. We provide experimental results demonstrating that the proposed aggregation strategy is on par with the prediction accuracy of kernel SVMs tuned by cross-validation. The comparable accuracy is achieved at a much lower computation cost, since training many SVMs on small subsamples is significantly cheaper than training one SVM on the whole data due to super-quadratic training time of kernel SVMs. Remarkably, our prediction approach is based on minimization of a theoretical bound and does not require parameter cross-validation, as opposed to the majority of theoretical results that cannot be rigorously applied in practice.
View on arXiv