ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1901.09321
  4. Cited By
Fixup Initialization: Residual Learning Without Normalization

Fixup Initialization: Residual Learning Without Normalization

27 January 2019
Hongyi Zhang
Yann N. Dauphin
Tengyu Ma
    ODL
    AI4CE
ArXivPDFHTML

Papers citing "Fixup Initialization: Residual Learning Without Normalization"

50 / 75 papers shown
Title
Don't be lazy: CompleteP enables compute-efficient deep transformers
Don't be lazy: CompleteP enables compute-efficient deep transformers
Nolan Dey
Bin Claire Zhang
Lorenzo Noci
Mufan Bill Li
Blake Bordelon
Shane Bergsma
C. Pehlevan
Boris Hanin
Joel Hestness
39
0
0
02 May 2025
SpINR: Neural Volumetric Reconstruction for FMCW Radars
SpINR: Neural Volumetric Reconstruction for FMCW Radars
Harshvardhan Takawale
Nirupam Roy
30
0
0
30 Mar 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu (Allen) Zhang
Gaojie Jin
X. Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
39
0
0
24 Feb 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
42
1
0
12 Jan 2025
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Fast Training of Sinusoidal Neural Fields via Scaling Initialization
Taesun Yeom
Sangyoon Lee
Jaeho Lee
53
2
0
07 Oct 2024
Benchmarking the Attribution Quality of Vision Models
Benchmarking the Attribution Quality of Vision Models
Robin Hesse
Simone Schaub-Meyer
Stefan Roth
FAtt
34
3
0
16 Jul 2024
Understanding and Minimising Outlier Features in Neural Network Training
Understanding and Minimising Outlier Features in Neural Network Training
Bobby He
Lorenzo Noci
Daniele Paliotta
Imanol Schlag
Thomas Hofmann
36
3
0
29 May 2024
Understanding the training of infinitely deep and wide ResNets with
  Conditional Optimal Transport
Understanding the training of infinitely deep and wide ResNets with Conditional Optimal Transport
Raphael Barboni
Gabriel Peyré
Franccois-Xavier Vialard
37
3
0
19 Mar 2024
Principled Weight Initialization for Hypernetworks
Principled Weight Initialization for Hypernetworks
Oscar Chang
Lampros Flokas
Hod Lipson
22
73
0
13 Dec 2023
Quantitative CLTs in Deep Neural Networks
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
23
11
0
12 Jul 2023
The R-mAtrIx Net
The R-mAtrIx Net
Shailesh Lal
Suvajit Majumder
E. Sobko
24
5
0
14 Apr 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks
  with Soft-Thresholding
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Chunyan Xiong
Meng Lu
Xiaotong Yu
JIAN-PENG Cao
Zhong Chen
D. Guo
X. Qu
MLT
35
0
0
14 Apr 2023
Can We Scale Transformers to Predict Parameters of Diverse ImageNet
  Models?
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Boris Knyazev
Doha Hwang
Simon Lacoste-Julien
AI4CE
24
17
0
07 Mar 2023
A Survey on Efficient Training of Transformers
A Survey on Efficient Training of Transformers
Bohan Zhuang
Jing Liu
Zizheng Pan
Haoyu He
Yuetian Weng
Chunhua Shen
28
47
0
02 Feb 2023
Scaling laws for single-agent reinforcement learning
Scaling laws for single-agent reinforcement learning
Jacob Hilton
Jie Tang
John Schulman
22
20
0
31 Jan 2023
Expected Gradients of Maxout Networks and Consequences to Parameter
  Initialization
Expected Gradients of Maxout Networks and Consequences to Parameter Initialization
Hanna Tseran
Guido Montúfar
ODL
22
0
0
17 Jan 2023
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
REPAIR: REnormalizing Permuted Activations for Interpolation Repair
Keller Jordan
Hanie Sedghi
O. Saukh
R. Entezari
Behnam Neyshabur
MoMe
46
94
0
15 Nov 2022
Do Bayesian Neural Networks Need To Be Fully Stochastic?
Do Bayesian Neural Networks Need To Be Fully Stochastic?
Mrinank Sharma
Sebastian Farquhar
Eric T. Nalisnick
Tom Rainforth
BDL
18
52
0
11 Nov 2022
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR
  Prediction
SML:Enhance the Network Smoothness with Skip Meta Logit for CTR Prediction
Wenlong Deng
Lang Lang
Z. Liu
B. Liu
21
0
0
09 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
37
2
0
05 Oct 2022
Removing Batch Normalization Boosts Adversarial Training
Removing Batch Normalization Boosts Adversarial Training
Haotao Wang
Aston Zhang
Shuai Zheng
Xingjian Shi
Mu Li
Zhangyang Wang
32
41
0
04 Jul 2022
Cold Posteriors through PAC-Bayes
Cold Posteriors through PAC-Bayes
Konstantinos Pitas
Julyan Arbel
23
5
0
22 Jun 2022
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning
Adapting the Linearised Laplace Model Evidence for Modern Deep Learning
Javier Antorán
David Janz
J. Allingham
Erik A. Daxberger
Riccardo Barbano
Eric T. Nalisnick
José Miguel Hernández-Lobato
UQCV
BDL
27
28
0
17 Jun 2022
Scaling ResNets in the Large-depth Regime
Scaling ResNets in the Large-depth Regime
P. Marion
Adeline Fermanian
Gérard Biau
Jean-Philippe Vert
26
16
0
14 Jun 2022
Guidelines for the Regularization of Gammas in Batch Normalization for
  Deep Residual Networks
Guidelines for the Regularization of Gammas in Batch Normalization for Deep Residual Networks
Bum Jun Kim
Hyeyeon Choi
Hyeonah Jang
Dong Gu Lee
Wonseok Jeong
Sang Woo Kim
16
4
0
15 May 2022
Online Convolutional Re-parameterization
Online Convolutional Re-parameterization
Mu Hu
Junyi Feng
Jiashen Hua
Baisheng Lai
Jianqiang Huang
Xiaojin Gong
Xiansheng Hua
19
26
0
02 Apr 2022
Deep Learning without Shortcuts: Shaping the Kernel with Tailored
  Rectifiers
Deep Learning without Shortcuts: Shaping the Kernel with Tailored Rectifiers
Guodong Zhang
Aleksandar Botev
James Martens
OffRL
21
26
0
15 Mar 2022
Acceleration of Federated Learning with Alleviated Forgetting in Local
  Training
Acceleration of Federated Learning with Alleviated Forgetting in Local Training
Chencheng Xu
Zhiwei Hong
Minlie Huang
Tao Jiang
FedML
19
45
0
05 Mar 2022
DeepNet: Scaling Transformers to 1,000 Layers
DeepNet: Scaling Transformers to 1,000 Layers
Hongyu Wang
Shuming Ma
Li Dong
Shaohan Huang
Dongdong Zhang
Furu Wei
MoE
AI4CE
15
156
0
01 Mar 2022
TrimBERT: Tailoring BERT for Trade-offs
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
21
4
0
24 Feb 2022
Bayesian Model Selection, the Marginal Likelihood, and Generalization
Bayesian Model Selection, the Marginal Likelihood, and Generalization
Sanae Lotfi
Pavel Izmailov
Gregory W. Benton
Micah Goldblum
A. Wilson
UQCV
BDL
52
56
0
23 Feb 2022
Invariance Learning in Deep Neural Networks with Differentiable Laplace
  Approximations
Invariance Learning in Deep Neural Networks with Differentiable Laplace Approximations
Alexander Immer
Tycho F. A. van der Ouderaa
Gunnar Rätsch
Vincent Fortuin
Mark van der Wilk
BDL
31
44
0
22 Feb 2022
Understanding AdamW through Proximal Methods and Scale-Freeness
Understanding AdamW through Proximal Methods and Scale-Freeness
Zhenxun Zhuang
Mingrui Liu
Ashok Cutkosky
Francesco Orabona
37
63
0
31 Jan 2022
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
Evaluating Gradient Inversion Attacks and Defenses in Federated Learning
Yangsibo Huang
Samyak Gupta
Zhao-quan Song
Kai Li
Sanjeev Arora
FedML
AAML
SILM
12
269
0
30 Nov 2021
Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks
Hidden-Fold Networks: Random Recurrent Residuals Using Sparse Supermasks
Ángel López García-Arias
Masanori Hashimoto
Masato Motomura
Jaehoon Yu
31
5
0
24 Nov 2021
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
A Johnson--Lindenstrauss Framework for Randomly Initialized CNNs
Ido Nachum
Jan Hkazla
Michael C. Gastpar
Anatoly Khina
30
0
0
03 Nov 2021
AdjointBackMapV2: Precise Reconstruction of Arbitrary CNN Unit's
  Activation via Adjoint Operators
AdjointBackMapV2: Precise Reconstruction of Arbitrary CNN Unit's Activation via Adjoint Operators
Qing Wan
Siu Wun Cheung
Yoonsuck Choe
21
0
0
04 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Neural HMMs are all you need (for high-quality attention-free TTS)
Neural HMMs are all you need (for high-quality attention-free TTS)
Shivam Mehta
Éva Székely
Jonas Beskow
G. Henter
19
18
0
30 Aug 2021
StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
StarEnhancer: Learning Real-Time and Style-Aware Image Enhancement
Yuda Song
Hui Qian
Xin Du
10
47
0
27 Jul 2021
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width
  Limit at Initialization
The Future is Log-Gaussian: ResNets and Their Infinite-Depth-and-Width Limit at Initialization
Mufan Bill Li
Mihai Nica
Daniel M. Roy
23
33
0
07 Jun 2021
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results
NTIRE 2021 Challenge on Burst Super-Resolution: Methods and Results
Goutam Bhat
Martin Danelljan
Radu Timofte
Kazutoshi Akita
Wooyeong Cho
...
Rao Muhammad Umer
Youliang Yan
Lei Yu
Magauiya Zhussip
X. Zou
SupR
16
38
0
07 Jun 2021
"BNN - BN = ?": Training Binary Neural Networks without Batch
  Normalization
"BNN - BN = ?": Training Binary Neural Networks without Batch Normalization
Tianlong Chen
Zhenyu (Allen) Zhang
Xu Ouyang
Zechun Liu
Zhiqiang Shen
Zhangyang Wang
MQ
37
36
0
16 Apr 2021
Going deeper with Image Transformers
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
25
986
0
31 Mar 2021
Large Batch Simulation for Deep Reinforcement Learning
Large Batch Simulation for Deep Reinforcement Learning
Brennan Shacklett
Erik Wijmans
Aleksei Petrenko
Manolis Savva
Dhruv Batra
V. Koltun
Kayvon Fatahalian
3DV
OffRL
AI4CE
27
26
0
12 Mar 2021
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training
GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
W. R. Huang
Tom Goldstein
ODL
41
53
0
16 Feb 2021
Infinitely Deep Bayesian Neural Networks with Stochastic Differential
  Equations
Infinitely Deep Bayesian Neural Networks with Stochastic Differential Equations
Winnie Xu
Ricky T. Q. Chen
Xuechen Li
D. Duvenaud
BDL
UQCV
21
46
0
12 Feb 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
Optimizing Deeper Transformers on Small Datasets
Optimizing Deeper Transformers on Small Datasets
Peng-Tao Xu
Dhruv Kumar
Wei Yang
Wenjie Zi
Keyi Tang
Chenyang Huang
Jackie C.K. Cheung
S. Prince
Yanshuai Cao
AI4CE
16
68
0
30 Dec 2020
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them
  on Images
Very Deep VAEs Generalize Autoregressive Models and Can Outperform Them on Images
R. Child
BDL
VLM
31
336
0
20 Nov 2020
12
Next