Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
1901.09321
Cited By
v1
v2 (latest)
Fixup Initialization: Residual Learning Without Normalization
27 January 2019
Hongyi Zhang
Yann N. Dauphin
Tengyu Ma
ODL
AI4CE
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Fixup Initialization: Residual Learning Without Normalization"
50 / 243 papers shown
Gradient Descent with Provably Tuned Learning-rate Schedules
Dravyansh Sharma
231
0
0
04 Dec 2025
Readout-Side Bypass for Residual Hybrid Quantum-Classical Models
Guilin Zhang
Wulan Guo
Ziqi Tan
Hongyang He
Qiang Guan
Hailong Jiang
128
1
0
25 Nov 2025
Row-stochastic matrices can provably outperform doubly stochastic matrices in decentralized learning
Bing Liu
Boao Kong
Limin Lu
Kun Yuan
Chengcheng Zhao
184
0
0
24 Nov 2025
The Hidden Power of Normalization Layers in Neural Networks: Exponential Capacity Control
Khoat Than
176
0
0
02 Nov 2025
Weight Initialization and Variance Dynamics in Deep Neural Networks and Large Language Models
Yankun Han
89
0
0
10 Oct 2025
Arithmetic-Mean
μ
μ
μ
P for Modern Architectures: A Unified Learning-Rate Scale for CNNs and ResNets
Haosong Zhang
Shenxi Wu
Yichi Zhang
Wei Lin
W. Lin
277
0
0
05 Oct 2025
On residual network depth
Benoit Dherin
Michael Munn
MDE
296
0
0
03 Oct 2025
Probability Distribution Collapse: A Critical Bottleneck to Compact Unsupervised Neural Grammar Induction
Jinwook Park
Kangil Kim
132
0
0
25 Sep 2025
E-CaTCH: Event-Centric Cross-Modal Attention with Temporal Consistency and Class-Imbalance Handling for Misinformation Detection
Ahmad Mousavi
Yeganeh Abdollahinejad
Roberto Corizzo
Nathalie Japkowicz
Zois Boukouvalas
146
0
0
15 Aug 2025
Compressed Decentralized Momentum Stochastic Gradient Methods for Nonconvex Optimization
Wei Liu
Anweshit Panda
Ujwal Pandey
Christopher Brissette
Yikang Shen
George M. Slota
Naigang Wang
Jie Chen
Yangyang Xu
222
2
0
07 Aug 2025
ResNets Are Deeper Than You Think
Christian H.X. Ali Mehmeti-Göpel
Michael Wand
262
2
0
17 Jun 2025
Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning
Julius Berner
Miguel Liu-Schiaffini
Jean Kossaifi
Valentin Duruisseaux
Boris Bonev
Kamyar Azizzadenesheli
A. Anandkumar
AI4CE
423
7
0
12 Jun 2025
Backward Oversmoothing: why is it hard to train deep Graph Neural Networks?
Nicolas Keriven
321
2
0
22 May 2025
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Li
Blake Bordelon
Shane Bergsma
Cengiz Pehlevan
Boris Hanin
Joel Hestness
713
38
0
02 May 2025
SpINR: Neural Volumetric Reconstruction for FMCW Radars
Harshvardhan Takawale
Nirupam Roy
233
2
0
30 Mar 2025
Transformers without Normalization
Computer Vision and Pattern Recognition (CVPR), 2025
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
OffRL
ViT
578
129
0
13 Mar 2025
A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization
Md Yousuf Harun
Christopher Kanan
AI4CE
380
2
0
09 Mar 2025
GradientStabilizer:Fix the Norm, Not the Gradient
Tianjin Huang
Haotian Hu
Zhenyu Zhang
Gaojie Jin
Xianrui Li
...
Qingsong Wen
Zhangyang Wang
Shiwei Liu
Qingsong Wen
Shiwei Liu
MQ
453
6
0
24 Feb 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
International Conference on Learning Representations (ICLR), 2025
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zinan Lin
Shiwei Liu
546
22
0
12 Jan 2025
Parseval Regularization for Continual Reinforcement Learning
Neural Information Processing Systems (NeurIPS), 2024
Wesley Chung
Lynn Cherif
David Meger
Doina Precup
CLL
330
18
0
10 Dec 2024
Scale Propagation Network for Generalizable Depth Completion
IEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2024
Haotian Wang
Meng Yang
Xinhu Zheng
Gang Hua
300
5
0
24 Oct 2024
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
International Conference on Learning Representations (ICLR), 2024
Taesun Yeom
Sangyoon Lee
Jaeho Lee
448
9
0
07 Oct 2024
Neutral Residues: Revisiting Adapters for Model Extension
Franck Signe Talla
Edouard Grave
Edouard Grave
418
2
0
03 Oct 2024
Approaching Deep Learning through the Spectral Dynamics of Weights
David Yunis
Kumar Kshitij Patel
Samuel Wheeler
Pedro H. P. Savarese
Gal Vardi
Karen Livescu
Michael Maire
Matthew R. Walter
389
17
0
21 Aug 2024
Benchmarking the Attribution Quality of Vision Models
Robin Hesse
Simone Schaub-Meyer
Stefan Roth
FAtt
384
5
0
16 Jul 2024
MSRS: Training Multimodal Speech Recognition Models from Scratch with Sparse Mask Optimization
Adriana Fernandez-Lopez
Honglie Chen
Pingchuan Ma
Lu Yin
Q. Xiao
Stavros Petridis
Shiwei Liu
Maja Pantic
250
2
0
25 Jun 2024
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
378
11
0
29 May 2024
HILCodec: High Fidelity and Lightweight Neural Audio Codec
S. Ahn
Beom Jun Woo
Mingrui Han
Chanyeong Moon
Nam Soo Kim
382
18
0
08 May 2024
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Communications on Pure and Applied Mathematics (CPAM), 2024
Raphael Barboni
Gabriel Peyré
Franccois-Xavier Vialard
466
3
0
19 Mar 2024
Efficient Stagewise Pretraining via Progressive Subnetworks
Abhishek Panigrahi
Nikunj Saunshi
Kaifeng Lyu
Sobhan Miryoosefi
Sashank J. Reddi
Satyen Kale
Sanjiv Kumar
283
9
0
08 Feb 2024
Gradient Flossing: Improving Gradient Descent through Dynamic Control of Jacobians
Rainer Engelken
295
12
0
28 Dec 2023
Spike No More: Stabilizing the Pre-training of Large Language Models
Sho Takase
Shun Kiyono
Sosuke Kobayashi
Jun Suzuki
536
40
0
28 Dec 2023
Principled Weight Initialization for Hypernetworks
International Conference on Learning Representations (ICLR), 2020
Oscar Chang
Lampros Flokas
Hod Lipson
379
87
0
13 Dec 2023
Simplifying Transformer Blocks
International Conference on Learning Representations (ICLR), 2023
Bobby He
Thomas Hofmann
507
51
0
03 Nov 2023
Kronecker-Factored Approximate Curvature for Modern Neural Network Architectures
Neural Information Processing Systems (NeurIPS), 2023
Runa Eschenhagen
Alexander Immer
Richard Turner
Frank Schneider
Philipp Hennig
416
39
0
01 Nov 2023
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection
Zhongzhan Huang
Pan Zhou
Shuicheng Yan
Guanbin Li
378
39
0
20 Oct 2023
Reusing Pretrained Models by Multi-linear Operators for Efficient Training
Yu Pan
Ye Yuan
Yichun Yin
Zenglin Xu
Lifeng Shang
Xin Jiang
Qun Liu
324
21
0
16 Oct 2023
LoftQ: LoRA-Fine-Tuning-Aware Quantization for Large Language Models
International Conference on Learning Representations (ICLR), 2023
Yixiao Li
Yifan Yu
Chen Liang
Pengcheng He
Nikos Karampatziakis
Weizhu Chen
Tuo Zhao
MQ
628
213
0
12 Oct 2023
PHYDI: Initializing Parameterized Hypercomplex Neural Networks as Identity Functions
International Workshop on Machine Learning for Signal Processing (MLSP), 2023
Matteo Mancanelli
Eleonora Grassucci
A. Uncini
Danilo Comminiello
AI4CE
568
3
0
11 Oct 2023
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit
International Conference on Learning Representations (ICLR), 2023
Blake Bordelon
Lorenzo Noci
Mufan Li
Boris Hanin
Cengiz Pehlevan
486
51
0
28 Sep 2023
The fine print on tempered posteriors
Asian Conference on Machine Learning (ACML), 2023
Konstantinos Pitas
Julyan Arbel
311
4
0
11 Sep 2023
Implicit regularization of deep residual networks towards neural ODEs
International Conference on Learning Representations (ICLR), 2023
Pierre Marion
Yu-Han Wu
Michael E. Sander
Gérard Biau
527
23
0
03 Sep 2023
Quantitative CLTs in Deep Neural Networks
Probability theory and related fields (PTRF), 2023
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
848
30
0
12 Jul 2023
Spectral Batch Normalization: Normalization in the Frequency Domain
IEEE International Joint Conference on Neural Network (IJCNN), 2023
Rinor Cakaj
Jens Mehnert
Bin Yang
329
4
0
29 Jun 2023
Principles for Initialization and Architecture Selection in Graph Neural Networks with ReLU Activations
G. Dezoort
Boris Hanin
AI4CE
317
3
0
20 Jun 2023
Understanding Optimization of Deep Learning via Jacobian Matrix and Lipschitz Constant
Xianbiao Qi
Jianan Wang
Lei Zhang
269
0
0
15 Jun 2023
FedWon: Triumphing Multi-domain Federated Learning Without Normalization
International Conference on Learning Representations (ICLR), 2023
Weiming Zhuang
Lingjuan Lyu
281
14
0
09 Jun 2023
Normalization Layers Are All That Sharpness-Aware Minimization Needs
Neural Information Processing Systems (NeurIPS), 2023
Maximilian Mueller
Tiffany J. Vlaar
David Rolnick
Matthias Hein
367
35
0
07 Jun 2023
Stochastic Marginal Likelihood Gradients using Neural Tangent Kernels
International Conference on Machine Learning (ICML), 2023
Alexander Immer
Tycho F. A. van der Ouderaa
Mark van der Wilk
Gunnar Rätsch
Bernhard Schölkopf
BDL
302
17
0
06 Jun 2023
'Tax-free' 3DMM Conditional Face Generation
Yiwen Huang
Zhiqiu Yu
Xinjie Yi
Yue Wang
James Tompkin
CVBM
177
0
0
22 May 2023
1
2
3
4
5
Next
Page 1 of 5