Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.04657
Cited By
Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise
7 June 2024
Vignesh Kothapalli
Tianyu Pang
Shenyang Deng
Zongmin Liu
Yaoqing Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Crafting Heavy-Tails in Weight Matrix Spectrum without Gradient Noise"
10 / 10 papers shown
Title
Model Balancing Helps Low-data Training and Fine-tuning
Zihang Liu
Y. Hu
Tianyu Pang
Yefan Zhou
Pu Ren
Yaoqing Yang
29
2
0
16 Oct 2024
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Haiquan Lu
Yefan Zhou
Shiwei Liu
Zhangyang Wang
Michael W. Mahoney
Yaoqing Yang
13
0
0
14 Oct 2024
AlphaLoRA: Assigning LoRA Experts Based on Layer Training Quality
Peijun Qing
Chongyang Gao
Yefan Zhou
Xingjian Diao
Yaoqing Yang
Soroush Vosoughi
MoMe
MoE
19
3
0
14 Oct 2024
Asymptotics of feature learning in two-layer networks after one gradient-step
Hugo Cui
Luca Pesce
Yatin Dandi
Florent Krzakala
Yue M. Lu
Lenka Zdeborová
Bruno Loureiro
MLT
44
16
0
07 Feb 2024
The Benefits of Reusing Batches for Gradient Descent in Two-Layer Networks: Breaking the Curse of Information and Leap Exponents
Yatin Dandi
Emanuele Troiani
Luca Arnaboldi
Luca Pesce
Lenka Zdeborová
Florent Krzakala
MLT
59
24
0
05 Feb 2024
Temperature Balancing, Layer-wise Weight Analysis, and Neural Network Training
Yefan Zhou
Tianyu Pang
Keqin Liu
Charles H. Martin
Michael W. Mahoney
Yaoqing Yang
34
7
0
01 Dec 2023
Noise Is Not the Main Factor Behind the Gap Between SGD and Adam on Transformers, but Sign Descent Might Be
Frederik Kunstner
Jacques Chen
J. Lavington
Mark W. Schmidt
38
66
0
27 Apr 2023
Learning Single-Index Models with Shallow Neural Networks
A. Bietti
Joan Bruna
Clayton Sanford
M. Song
160
65
0
27 Oct 2022
Neural Networks Efficiently Learn Low-Dimensional Representations with SGD
Alireza Mousavi-Hosseini
Sejun Park
M. Girotti
Ioannis Mitliagkas
Murat A. Erdogdu
MLT
319
48
0
29 Sep 2022
The large learning rate phase of deep learning: the catapult mechanism
Aitor Lewkowycz
Yasaman Bahri
Ethan Dyer
Jascha Narain Sohl-Dickstein
Guy Gur-Ari
ODL
150
232
0
04 Mar 2020
1