Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2002.10365
Cited By
The Early Phase of Neural Network Training
International Conference on Learning Representations (ICLR), 2020
24 February 2020
Jonathan Frankle
D. Schwab
Ari S. Morcos
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Early Phase of Neural Network Training"
50 / 113 papers shown
Improving Chain-of-Thought Efficiency for Autoregressive Image Generation
Zeqi Gu
Markos Georgopoulos
Xiaoliang Dai
Marjan Ghazvininejad
Chu Wang
...
Zecheng He
Zijian He
Jiawei Zhou
Abe Davis
Jialiang Wang
LRM
174
0
0
07 Oct 2025
TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning
Seohyun Lee
Wenzhi Fang
Dong-Jun Han
Seyyedali Hosseinalipour
Christopher G. Brinton
156
1
0
30 Sep 2025
Contextual Learning for Anomaly Detection in Tabular Data
Spencer King
Zhilu Zhang
Ruofan Yu
Baris Coskun
Wei Ding
Qian Cui
186
0
0
10 Sep 2025
On Using Large-Batches in Federated Learning
Sahil Tyagi
FedML
144
0
0
05 Sep 2025
The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions
Devin Kwok
Gül Sena Altıntaş
Colin Raffel
David Rolnick
474
3
0
16 Jun 2025
New Evidence of the Two-Phase Learning Dynamics of Neural Networks
Zhanpeng Zhou
Yongyi Yang
Mahito Sugiyama
Junchi Yan
248
3
0
20 May 2025
Investigating Task Arithmetic for Zero-Shot Information Retrieval
Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025
Marco Braga
Pranav Kasela
Alessandro Raganato
G. Pasi
RALM
485
4
0
01 May 2025
Emergence of Computational Structure in a Neural Network Physics Simulator
Rohan Hitchcock
Gary W. Delaney
J. Manton
Richard Scalzo
Jingge Zhu
316
1
0
16 Apr 2025
Enlightenment Period Improving DNN Performance
Tiantian Liu
Meng Wan
Meng Wan
Jue Wang
274
0
0
02 Apr 2025
Adaptive Unimodal Regulation for Balanced Multimodal Information Acquisition
Computer Vision and Pattern Recognition (CVPR), 2025
Chengxiang Huang
Yake Wei
Zequn Yang
D. Hu
321
13
0
24 Mar 2025
ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data Valuation
North American Chapter of the Association for Computational Linguistics (NAACL), 2025
Yanzhou Pan
Huawei Lin
Yide Ran
Jiamin Chen
Xiaodong Yu
Weijie Zhao
Denghui Zhang
Zhaozhuo Xu
350
5
0
02 Mar 2025
Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts
Chaitanya Kapoor
Sudhanshu Srivastava
Meenakshi Khosla
464
3
0
26 Feb 2025
Using Pre-trained LLMs for Multivariate Time Series Forecasting
Malcolm Wolff
Shenghao Yang
Kari Torkkola
Michael W. Mahoney
AI4TS
AIFin
292
6
0
10 Jan 2025
Uncovering Memorization Effect in the Presence of Spurious Correlations
Nature Communications (Nat Commun), 2025
Chenyu You
Haocheng Dai
Yifei Min
Jasjeet Sekhon
S. Joshi
James S. Duncan
602
3
0
01 Jan 2025
A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample Perspective
Neural Information Processing Systems (NeurIPS), 2024
Yeonsung Jung
Jaeyun Song
J. Yang
Jin-Hwa Kim
Sung-Yub Kim
Eunho Yang
528
4
0
01 Nov 2024
Chasing Better Deep Image Priors between Over- and Under-parameterization
Qiming Wu
Xiaohan Chen
Lezhi Li
Zhangyang Wang
392
1
0
31 Oct 2024
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity
Neural Information Processing Systems (NeurIPS), 2024
Baekrok Shin
Junsoo Oh
Hanseul Cho
Chulhee Yun
AI4CE
364
3
0
30 Oct 2024
Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in Training
International Conference on Learning Representations (ICLR), 2024
Zhanpeng Zhou
Mingze Wang
Yuchen Mao
Bingrui Li
Junchi Yan
AAML
601
14
0
14 Oct 2024
Can Optimization Trajectories Explain Multi-Task Transfer?
David Mueller
Mark Dredze
Nicholas Andrews
499
2
0
26 Aug 2024
HyperbolicLR: Epoch insensitive learning rate scheduler
Tae-Geun Kim
380
4
0
21 Jul 2024
On the Limitations of Compute Thresholds as a Governance Strategy
Sara Hooker
507
29
0
08 Jul 2024
Memorization in deep learning: A survey
Jiaheng Wei
Yanjun Zhang
Leo Yu Zhang
Ming Ding
Chao Chen
Kok-Leong Ong
Jun Zhang
Yang Xiang
387
25
0
06 Jun 2024
Understanding Token Probability Encoding in Output Embeddings
Hakaze Cho
Yoshihiro Sakai
Kenshiro Tanaka
Mariko Kato
Naoya Inoue
365
3
0
03 Jun 2024
Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent
Pingzhi Li
Junyu Liu
Hanrui Wang
Tianlong Chen
647
2
0
30 Apr 2024
Random Search as a Baseline for Sparse Neural Network Architecture Search
Rezsa Farahani
337
0
0
13 Mar 2024
Masks, Signs, And Learning Rate Rewinding
Advait Gadhikar
R. Burkholz
283
15
0
29 Feb 2024
Towards On-device Learning on the Edge: Ways to Select Neurons to Update under a Budget Constraint
Ael Quélennec
Enzo Tartaglione
Pavlo Mozharovskyi
Van-Tam Nguyen
280
7
0
08 Dec 2023
Flexible Communication for Optimal Distributed Learning over Unpredictable Networks
BigData Congress [Services Society] (BSS), 2023
S. Tyagi
Martin Swany
492
3
0
05 Dec 2023
Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer Learning
European Conference on Artificial Intelligence (ECAI), 2023
Lapo Frati
Neil Traft
Jeff Clune
Nick Cheney
CLL
273
6
0
12 Oct 2023
A path-norm toolkit for modern networks: consequences, promises and challenges
International Conference on Learning Representations (ICLR), 2023
Antoine Gonon
Nicolas Brisebarre
E. Riccietti
Rémi Gribonval
552
13
0
02 Oct 2023
Latent State Models of Training Dynamics
Michael Y. Hu
Angelica Chen
Naomi Saphra
Dong Wang
488
18
0
18 Aug 2023
Can Neural Network Memorization Be Localized?
International Conference on Machine Learning (ICML), 2023
Pratyush Maini
Michael C. Mozer
Hanie Sedghi
Zachary Chase Lipton
J. Zico Kolter
Chiyuan Zhang
TDI
282
80
0
18 Jul 2023
Co(ve)rtex: ML Models as storage channels and their (mis-)applications
Md Abdullah Al Mamun
Quazi Mishkatul Alam
Erfan Shayegani
Pedram Zaree
Ihsen Alouani
Nael B. Abu-Ghazaleh
356
0
0
17 Jul 2023
Accelerating Distributed ML Training via Selective Synchronization
IEEE International Conference on Cluster Computing (CLUSTER), 2023
S. Tyagi
Martin Swany
FedML
403
8
0
16 Jul 2023
Single-Stage Heavy-Tailed Food Classification
International Conference on Information Photonics (ICIP), 2023
Jiangpeng He
Fengqing Zhu
296
13
0
01 Jul 2023
Catapults in SGD: spikes in the training loss and their impact on generalization through feature learning
International Conference on Machine Learning (ICML), 2023
Libin Zhu
Chaoyue Liu
Adityanarayanan Radhakrishnan
M. Belkin
489
28
0
07 Jun 2023
Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability
R. T. Lange
Henning Sprekeler
224
2
0
31 May 2023
On the special role of class-selective neurons in early training
Omkar Ranadive
Nikhil Thakurdesai
Ari S. Morcos
Matthew L. Leavitt
Stéphane Deny
241
5
0
27 May 2023
GraVAC: Adaptive Compression for Communication-Efficient Distributed DL Training
IEEE International Conference on Cloud Computing (CLOUD), 2023
S. Tyagi
Martin Swany
317
8
0
20 May 2023
The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations
Inyoung Paik
Jaesik Choi
448
2
0
23 Apr 2023
Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis
Eirik Fladmark
Muhammad Hamza Sajjad
Laura Brinkholm Justesen
223
4
0
26 Mar 2023
Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image Retrieval
Computer Vision and Pattern Recognition (CVPR), 2023
Yi Xie
Huaidong Zhang
Xuemiao Xu
Jianqing Zhu
Shengfeng He
VLM
282
19
0
16 Mar 2023
Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!
International Conference on Learning Representations (ICLR), 2023
Shiwei Liu
Tianlong Chen
Zhenyu Zhang
Xuxi Chen
Tianjin Huang
Ajay Jaiswal
Zinan Lin
284
31
0
03 Mar 2023
Random Teachers are Good Teachers
International Conference on Machine Learning (ICML), 2023
Felix Sarnthein
Gregor Bachmann
Sotiris Anagnostidis
Thomas Hofmann
505
7
0
23 Feb 2023
Identifying Equivalent Training Dynamics
Neural Information Processing Systems (NeurIPS), 2023
William T. Redman
J. M. Bello-Rivas
M. Fonoberova
Ryan Mohr
Ioannis G. Kevrekidis
Igor Mezić
337
9
0
17 Feb 2023
ScaDLES: Scalable Deep Learning over Streaming data at the Edge
S. Tyagi
Martin Swany
320
9
0
21 Jan 2023
Maximal Initial Learning Rates in Deep ReLU Networks
International Conference on Machine Learning (ICML), 2022
Gaurav M. Iyer
Boris Hanin
David Rolnick
364
14
0
14 Dec 2022
Accelerating Dataset Distillation via Model Augmentation
Computer Vision and Pattern Recognition (CVPR), 2022
Lei Zhang
Jie M. Zhang
Bowen Lei
Subhabrata Mukherjee
Xiang Pan
Bo Zhao
Caiwen Ding
Yongbin Li
Dongkuan Xu
DD
399
79
0
12 Dec 2022
Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?
IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022
A. Vanderschueren
Christophe De Vleeschouwer
MQ
199
14
0
02 Dec 2022
Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation
Cody Blakeney
Jessica Zosa Forde
Jonathan Frankle
Ziliang Zong
Matthew L. Leavitt
VLM
316
4
0
01 Nov 2022
1
2
3
Next
Page 1 of 3