ZerO Initialization: Initializing Neural Networks with only Zeros and
Ones

v1v2v3 (latest)

ZerO Initialization: Initializing Neural Networks with only Zeros and Ones

25 October 2021

Florian Schäfer

Anima Anandkumar

ArXiv (abs)PDF HTML

Papers citing "ZerO Initialization: Initializing Neural Networks with only Zeros and Ones"

11 / 11 papers shown

Title
Principled Approaches for Extending Neural Architectures to Function Spaces for Operator Learning Julius Berner Miguel Liu-Schiaffini Jean Kossaifi Valentin Duruisseaux Boris Bonev Kamyar Azizzadenesheli A. Anandkumar AI4CE 116 0 0 12 Jun 2025
MLorc: Momentum Low-rank Compression for Large Language Model Adaptation Wei Shen Zhang Yaxiang Minhui Huang Mengfan Xu Jiawei Zhang Cong Shen AI4CE 44 0 0 02 Jun 2025
Protocol Models: Scaling Decentralized Training with Communication-Efficient Model Parallelism Sameera Ramasinghe Thalaiyasingam Ajanthan Gil Avraham Yan Zuo Alexander Long GNN 69 0 0 02 Jun 2025
SUMO: Subspace-Aware Moment-Orthogonalization for Accelerating Memory-Efficient LLM Training Yehonathan Refael Guy Smorodinsky Tom Tirer Ofir Lindenbaum 32 0 0 30 May 2025
ASGO: Adaptive Structured Gradient Optimization Kang An Yuxing Liu Boyao Wang Shiqian Ma Shiqian Ma Tong Zhang Tong Zhang ODL 150 5 0 26 Mar 2025
A Good Start Matters: Enhancing Continual Learning with Data-Driven Weight Initialization Md Yousuf Harun Christopher Kanan AI4CE 93 0 0 09 Mar 2025
AdaRankGrad: Adaptive Gradient-Rank and Moments for Memory-Efficient LLMs Training and Fine-Tuning Yehonathan Refael Jonathan Svirsky Boris Shustin Wasim Huleihel Ofir Lindenbaum 101 4 0 31 Dec 2024
Robust Weight Initialization for Tanh Neural Networks with Fixed Point Analysis Hyunwoo Lee Hayoung Choi Hyunju Kim 72 2 0 03 Oct 2024
Leveraging Sub-Optimal Data for Human-in-the-Loop Reinforcement Learning Calarina Muslimani Matthew E. Taylor OffRL 126 2 0 30 Apr 2024
GaLore: Memory-Efficient LLM Training by Gradient Low-Rank Projection Jiawei Zhao Zhenyu Zhang Beidi Chen Zhangyang Wang A. Anandkumar Yuandong Tian 106 229 0 06 Mar 2024
Nonparametric Learning of Two-Layer ReLU Residual Units Zhunxuan Wang Linyun He Chunchuan Lyu Shay B. Cohen MLT OffRL 199 1 0 17 Aug 2020