Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2502.19002
Cited By
The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training
26 February 2025
Jinbo Wang
Mingze Wang
Zhanpeng Zhou
Junchi Yan
Weinan E
Lei Wu
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Sharpness Disparity Principle in Transformers for Accelerating Language Model Pre-Training"
1 / 1 papers shown
Title
Towards Quantifying the Hessian Structure of Neural Networks
Zhaorui Dong
Yushun Zhang
Z. Luo
Jianfeng Yao
Ruoyu Sun
24
0
0
05 May 2025
1