ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03447
15
3

Blessing of Class Diversity in Pre-training

7 September 2022
Yulai Zhao
Jianshu Chen
S. Du
    AI4CE
ArXivPDFHTML
Abstract

This paper presents a new statistical analysis aiming to explain the recent superior achievements of the pre-training techniques in natural language processing (NLP). We prove that when the classes of the pre-training task (e.g., different words in the masked language model task) are sufficiently diverse, in the sense that the least singular value of the last linear layer in pre-training (denoted as ν~\tilde{\nu}ν~) is large, then pre-training can significantly improve the sample efficiency of downstream tasks. Specially, we show the transfer learning excess risk enjoys an O(1ν~n)O\left(\frac{1}{\tilde{\nu} \sqrt{n}}\right)O(ν~n​1​) rate, in contrast to the O(1m)O\left(\frac{1}{\sqrt{m}}\right)O(m​1​) rate in the standard supervised learning. Here, nnn is the number of pre-training data and mmm is the number of data in the downstream task, and typically n≫mn \gg mn≫m. Our proof relies on a vector-form Rademacher complexity chain rule for disassembling composite function classes and a modified self-concordance condition. These techniques can be of independent interest.

View on arXiv
Comments on this paper