The Early Phase of Neural Network Training

International Conference on Learning Representations (ICLR), 2020

24 February 2020

Papers citing "The Early Phase of Neural Network Training"

50 / 113 papers shown

Improving Chain-of-Thought Efficiency for Autoregressive Image Generation

...

174

07 Oct 2025

TAP: Two-Stage Adaptive Personalization of Multi-Task and Multi-Modal Foundation Models in Federated Learning

Seohyun Lee

Wenzhi Fang

Dong-Jun Han

Seyyedali Hosseinalipour

Christopher G. Brinton

161

30 Sep 2025

Contextual Learning for Anomaly Detection in Tabular Data

189

10 Sep 2025

On Using Large-Batches in Federated Learning

Sahil Tyagi

FedML

147

05 Sep 2025

The Butterfly Effect: Neural Network Training Trajectories Are Highly Sensitive to Initial Conditions

475

16 Jun 2025

New Evidence of the Two-Phase Learning Dynamics of Neural Networks

251

20 May 2025

Investigating Task Arithmetic for Zero-Shot Information RetrievalAnnual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR), 2025

495

01 May 2025

Emergence of Computational Structure in a Neural Network Physics Simulator

320

16 Apr 2025

Enlightenment Period Improving DNN Performance

Tiantian Liu

Meng Wan

Jue Wang

275

02 Apr 2025

Adaptive Unimodal Regulation for Balanced Multimodal Information AcquisitionComputer Vision and Pattern Recognition (CVPR), 2025

323

24 Mar 2025

ALinFiK: Learning to Approximate Linearized Future Influence Kernel for Scalable Third-Party LLM Data ValuationNorth American Chapter of the Association for Computational Linguistics (NAACL), 2025

353

02 Mar 2025

Bridging Critical Gaps in Convergent Learning: How Representational Alignment Evolves Across Layers, Training, and Distribution Shifts

Chaitanya Kapoor

Sudhanshu Srivastava

Meenakshi Khosla

480

26 Feb 2025

Using Pre-trained LLMs for Multivariate Time Series Forecasting

293

10 Jan 2025

Uncovering Memorization Effect in the Presence of Spurious CorrelationsNature Communications (Nat Commun), 2025

607

01 Jan 2025

A Simple Remedy for Dataset Bias via Self-Influence: A Mislabeled Sample PerspectiveNeural Information Processing Systems (NeurIPS), 2024

528

01 Nov 2024

Chasing Better Deep Image Priors between Over- and Under-parameterization

395

31 Oct 2024

DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of PlasticityNeural Information Processing Systems (NeurIPS), 2024

365

30 Oct 2024

Sharpness-Aware Minimization Efficiently Selects Flatter Minima Late in TrainingInternational Conference on Learning Representations (ICLR), 2024

605

14 Oct 2024

Can Optimization Trajectories Explain Multi-Task Transfer?

David Mueller

Mark Dredze

Nicholas Andrews

509

26 Aug 2024

HyperbolicLR: Epoch insensitive learning rate scheduler

Tae-Geun Kim

385

21 Jul 2024

On the Limitations of Compute Thresholds as a Governance Strategy

Sara Hooker

509

08 Jul 2024

Memorization in deep learning: A survey

Jiaheng Wei

Yanjun Zhang

Leo Yu Zhang

Yang Xiang

387

06 Jun 2024

Understanding Token Probability Encoding in Output Embeddings

367

03 Jun 2024

Q-Newton: Hybrid Quantum-Classical Scheduling for Accelerating Neural Network Training with Newton's Gradient Descent

648

30 Apr 2024

Random Search as a Baseline for Sparse Neural Network Architecture Search

Rezsa Farahani

338

13 Mar 2024

Masks, Signs, And Learning Rate Rewinding

Advait Gadhikar

R. Burkholz

286

29 Feb 2024

Towards On-device Learning on the Edge: Ways to Select Neurons to Update under a Budget Constraint

Ael Quélennec

Enzo Tartaglione

Pavlo Mozharovskyi

Van-Tam Nguyen

280

08 Dec 2023

Flexible Communication for Optimal Distributed Learning over Unpredictable NetworksBigData Congress [Services Society] (BSS), 2023

S. Tyagi

Martin Swany

499

05 Dec 2023

Reset It and Forget It: Relearning Last-Layer Weights Improves Continual and Transfer LearningEuropean Conference on Artificial Intelligence (ECAI), 2023

274

12 Oct 2023

A path-norm toolkit for modern networks: consequences, promises and challengesInternational Conference on Learning Representations (ICLR), 2023

Antoine Gonon

Nicolas Brisebarre

E. Riccietti

Rémi Gribonval

555

02 Oct 2023

Latent State Models of Training Dynamics

490

18 Aug 2023

Can Neural Network Memorization Be Localized?International Conference on Machine Learning (ICML), 2023

J. Zico Kolter

283

18 Jul 2023

Co(ve)rtex: ML Models as storage channels and their (mis-)applications

358

17 Jul 2023

Accelerating Distributed ML Training via Selective SynchronizationIEEE International Conference on Cluster Computing (CLUSTER), 2023

S. Tyagi

Martin Swany

FedML

410

16 Jul 2023

Single-Stage Heavy-Tailed Food ClassificationInternational Conference on Information Photonics (ICIP), 2023

Jiangpeng He

Fengqing Zhu

298

01 Jul 2023

Catapults in SGD: spikes in the training loss and their impact on generalization through feature learningInternational Conference on Machine Learning (ICML), 2023

Libin Zhu

Chaoyue Liu

Adityanarayanan Radhakrishnan

M. Belkin

490

07 Jun 2023

Lottery Tickets in Evolutionary Optimization: On Sparse Backpropagation-Free Trainability

R. T. Lange

Henning Sprekeler

226

31 May 2023

On the special role of class-selective neurons in early training

243

27 May 2023

GraVAC: Adaptive Compression for Communication-Efficient Distributed DL TrainingIEEE International Conference on Cloud Computing (CLOUD), 2023

S. Tyagi

Martin Swany

319

20 May 2023

The Disharmony between BN and ReLU Causes Gradient Explosion, but is Offset by the Correlation between Activations

Inyoung Paik

Jaesik Choi

448

23 Apr 2023

Exploring the Performance of Pruning Methods in Neural Networks: An Empirical Study of the Lottery Ticket Hypothesis

Eirik Fladmark

Muhammad Hamza Sajjad

Laura Brinkholm Justesen

226

26 Mar 2023

Towards a Smaller Student: Capacity Dynamic Distillation for Efficient Image RetrievalComputer Vision and Pattern Recognition (CVPR), 2023

283

16 Mar 2023

Sparsity May Cry: Let Us Fail (Current) Sparse Neural Networks Together!International Conference on Learning Representations (ICLR), 2023

289

03 Mar 2023

Random Teachers are Good TeachersInternational Conference on Machine Learning (ICML), 2023

507

23 Feb 2023

Identifying Equivalent Training DynamicsNeural Information Processing Systems (NeurIPS), 2023

Ioannis G. Kevrekidis

Igor Mezić

342

17 Feb 2023

ScaDLES: Scalable Deep Learning over Streaming data at the Edge

S. Tyagi

Martin Swany

320

21 Jan 2023

Maximal Initial Learning Rates in Deep ReLU NetworksInternational Conference on Machine Learning (ICML), 2022

Gaurav M. Iyer

Boris Hanin

David Rolnick

365

14 Dec 2022

Accelerating Dataset Distillation via Model AugmentationComputer Vision and Pattern Recognition (CVPR), 2022

Lei Zhang

Caiwen Ding

Dongkuan Xu

400

12 Dec 2022

Are Straight-Through gradients and Soft-Thresholding all you need for Sparse Training?IEEE Workshop/Winter Conference on Applications of Computer Vision (WACV), 2022

A. Vanderschueren

Christophe De Vleeschouwer

203

02 Dec 2022

Reduce, Reuse, Recycle: Improving Training Efficiency with Distillation

316

01 Nov 2022