ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2405.17767
25
11

Linguistic Collapse: Neural Collapse in (Large) Language Models

28 May 2024
Robert Wu
V. Papyan
ArXivPDFHTML
Abstract

Neural collapse (NC\mathcal{NC}NC) is a phenomenon observed in classification tasks where top-layer representations collapse into their class means, which become equinorm, equiangular and aligned with the classifiers. These behaviors -- associated with generalization and robustness -- would manifest under specific conditions: models are trained towards zero loss, with noise-free labels belonging to balanced classes, which do not outnumber the model's hidden dimension. Recent studies have explored NC\mathcal{NC}NC in the absence of one or more of these conditions to extend and capitalize on the associated benefits of ideal geometries. Language modeling presents a curious frontier, as \textit{training by token prediction} constitutes a classification task where none of the conditions exist: the vocabulary is imbalanced and exceeds the embedding dimension; different tokens might correspond to similar contextual embeddings; and large language models (LLMs) in particular are typically only trained for a few epochs. This paper empirically investigates the impact of scaling the architectures and training of causal language models (CLMs) on their progression towards NC\mathcal{NC}NC. We find that NC\mathcal{NC}NC properties that develop with scaling are linked to generalization. Moreover, there is evidence of some relationship between NC\mathcal{NC}NC and generalization independent of scale. Our work therefore underscores the generality of NC\mathcal{NC}NC as it extends to the novel and more challenging setting of language modeling. Downstream, we seek to inspire further research on the phenomenon to deepen our understanding of LLMs -- and neural networks at large -- and improve existing architectures based on NC\mathcal{NC}NC-related properties.

View on arXiv
Comments on this paper