ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19002
  4. Cited By
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training

26 February 2025
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
ArXivPDFHTML

Papers citing "The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training"

1 / 1 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
24
0
0
05 May 2025
1