ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.09201
22
2

Better-than-KL PAC-Bayes Bounds

14 February 2024
Ilja Kuzborskij
Kwang-Sung Jun
Yulian Wu
Kyoungseok Jang
Francesco Orabona
    FedML
ArXivPDFHTML
Abstract

Let f(θ,X1),f(\theta, X_1),f(θ,X1​), …, \dots,…, f(θ,Xn) f(\theta, X_n)f(θ,Xn​) be a sequence of random elements, where fff is a fixed scalar function, X1,…,XnX_1, \dots, X_nX1​,…,Xn​ are independent random variables (data), and θ\thetaθ is a random parameter distributed according to some data-dependent posterior distribution PnP_nPn​. In this paper, we consider the problem of proving concentration inequalities to estimate the mean of the sequence. An example of such a problem is the estimation of the generalization error of some predictor trained by a stochastic algorithm, such as a neural network where fff is a loss function. Classically, this problem is approached through a PAC-Bayes analysis where, in addition to the posterior, we choose a prior distribution which captures our belief about the inductive bias of the learning problem. Then, the key quantity in PAC-Bayes concentration bounds is a divergence that captures the complexity of the learning problem where the de facto standard choice is the KL divergence. However, the tightness of this choice has rarely been questioned. In this paper, we challenge the tightness of the KL-divergence-based bounds by showing that it is possible to achieve a strictly tighter bound. In particular, we demonstrate new high-probability PAC-Bayes bounds with a novel and better-than-KL divergence that is inspired by Zhang et al. (2022). Our proof is inspired by recent advances in regret analysis of gambling algorithms, and its use to derive concentration inequalities. Our result is first-of-its-kind in that existing PAC-Bayes bounds with non-KL divergences are not known to be strictly better than KL. Thus, we believe our work marks the first step towards identifying optimal rates of PAC-Bayes bounds.

View on arXiv
Comments on this paper