ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2010.09484
30
15

Information-Theoretic Bounds on Transfer Generalization Gap Based on Jensen-Shannon Divergence

13 October 2020
Sharu Theresa Jose
Osvaldo Simeone
ArXivPDFHTML
Abstract

In transfer learning, training and testing data sets are drawn from different data distributions. The transfer generalization gap is the difference between the population loss on the target data distribution and the training loss. The training data set generally includes data drawn from both source and target distributions. This work presents novel information-theoretic upper bounds on the average transfer generalization gap that capture (i)(i)(i) the domain shift between the target data distribution PZ′P'_ZPZ′​ and the source distribution PZP_ZPZ​ through a two-parameter family of generalized (α1,α2)(\alpha_1,\alpha_2)(α1​,α2​)-Jensen-Shannon (JS) divergences; and (ii)(ii)(ii) the sensitivity of the transfer learner output WWW to each individual sample of the data set ZiZ_iZi​ via the mutual information I(W;Zi)I(W;Z_i)I(W;Zi​). For α1∈(0,1)\alpha_1 \in (0,1)α1​∈(0,1), the (α1,α2)(\alpha_1,\alpha_2)(α1​,α2​)-JS divergence can be bounded even when the support of PZP_ZPZ​ is not included in that of PZ′P'_ZPZ′​. This contrasts the Kullback-Leibler (KL) divergence DKL(PZ∣∣PZ′)D_{KL}(P_Z||P'_Z)DKL​(PZ​∣∣PZ′​)-based bounds of Wu et al. [1], which are vacuous under this assumption. Moreover, the obtained bounds hold for unbounded loss functions with bounded cumulant generating functions, unlike the ϕ\phiϕ-divergence based bound of Wu et al. [1]. We also obtain new upper bounds on the average transfer excess risk in terms of the (α1,α2)(\alpha_1,\alpha_2)(α1​,α2​)-JS divergence for empirical weighted risk minimization (EWRM), which minimizes the weighted average training losses over source and target data sets. Finally, we provide a numerical example to illustrate the merits of the introduced bounds.

View on arXiv
Comments on this paper