ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2102.08098
  4. Cited By
GradInit: Learning to Initialize Neural Networks for Stable and
  Efficient Training

GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training

16 February 2021
Chen Zhu
Renkun Ni
Zheng Xu
Kezhi Kong
W. R. Huang
Tom Goldstein
    ODL
ArXivPDFHTML

Papers citing "GradInit: Learning to Initialize Neural Networks for Stable and Efficient Training"

12 / 12 papers shown
Title
No Train No Gain: Revisiting Efficient Training Algorithms For
  Transformer-based Language Models
No Train No Gain: Revisiting Efficient Training Algorithms For Transformer-based Language Models
Jean Kaddour
Oscar Key
Piotr Nawrot
Pasquale Minervini
Matt J. Kusner
15
41
0
12 Jul 2023
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks
  with Soft-Thresholding
Convex Dual Theory Analysis of Two-Layer Convolutional Neural Networks with Soft-Thresholding
Chunyan Xiong
Meng Lu
Xiaotong Yu
JIAN-PENG Cao
Zhong Chen
D. Guo
X. Qu
MLT
33
0
0
14 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
24
39
0
07 Apr 2023
Can We Scale Transformers to Predict Parameters of Diverse ImageNet
  Models?
Can We Scale Transformers to Predict Parameters of Diverse ImageNet Models?
Boris Knyazev
Doha Hwang
Simon Lacoste-Julien
AI4CE
24
17
0
07 Mar 2023
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated
  Learning
CyclicFL: A Cyclic Model Pre-Training Approach to Efficient Federated Learning
Peng Zhang
Yingbo Zhou
Ming Hu
Xin Fu
Xian Wei
Mingsong Chen
FedML
24
1
0
28 Jan 2023
NAR-Former: Neural Architecture Representation Learning towards Holistic
  Attributes Prediction
NAR-Former: Neural Architecture Representation Learning towards Holistic Attributes Prediction
Yun Yi
Haokui Zhang
Wenze Hu
Nannan Wang
Xiaoyu Wang
AI4TS
AI4CE
19
8
0
15 Nov 2022
MetaFormer Baselines for Vision
MetaFormer Baselines for Vision
Weihao Yu
Chenyang Si
Pan Zhou
Mi Luo
Yichen Zhou
Jiashi Feng
Shuicheng Yan
Xinchao Wang
MoE
23
156
0
24 Oct 2022
Dynamical Isometry for Residual Networks
Dynamical Isometry for Residual Networks
Advait Gadhikar
R. Burkholz
ODL
AI4CE
32
2
0
05 Oct 2022
NormFormer: Improved Transformer Pretraining with Extra Normalization
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
28
74
0
18 Oct 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Data-driven Weight Initialization with Sylvester Solvers
Data-driven Weight Initialization with Sylvester Solvers
Debasmit Das
Yash Bhalgat
Fatih Porikli
ODL
17
3
0
02 May 2021
High-Performance Large-Scale Image Recognition Without Normalization
High-Performance Large-Scale Image Recognition Without Normalization
Andrew Brock
Soham De
Samuel L. Smith
Karen Simonyan
VLM
223
512
0
11 Feb 2021
1