v1v2v3v4v5v6 (latest)

Don't Use Large Mini-Batches, Use Local SGD

22 August 2018

Papers citing "Don't Use Large Mini-Batches, Use Local SGD"

50 / 280 papers shown

Nesterov-Accelerated Robust Federated Learning Over Byzantine Adversaries

214

04 Nov 2025

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

Yuanjun Dai

Keqiang He

An Wang

117

09 Oct 2025

MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates

157

06 Oct 2025

Understanding Outer Optimizers in Local SGD: Learning Rates, Momentum, and Acceleration

170

12 Sep 2025

On Using Large-Batches in Federated Learning

Sahil Tyagi

FedML

110

05 Sep 2025

Communication Efficient LLM Pre-training with SparseLoCo

127

21 Aug 2025

Cooperative SGD with Dynamic Mixing Matrices

Soumya Sarkar

Shweta Jain

162

20 Aug 2025

FedEve: On Bridging the Client Drift and Period Drift for Cross-device Federated Learning

159

20 Aug 2025

FedMP: Tackling Medical Feature Heterogeneity in Federated Learning from a Manifold Perspective

126

07 Aug 2025

Communication-Efficient Distributed Training for Collaborative Flat Optima Recovery in Deep Learning

Tolga Dimlioglu

A. Choromańska

FedML

291

27 Jul 2025

HASFL: Heterogeneity-aware Split Federated Learning over Edge Computing Systems

248

10 Jun 2025

MuLoCo: Muon is a practical inner optimizer for DiLoCo

170

29 May 2025

Sharp Gaussian approximations for Decentralized Federated Learning

316

12 May 2025

Pseudo-Asynchronous Local SGD: Robust and Efficient Data-Parallel Training

496

25 Apr 2025

Federated Learning for Medical Image Classification: A Comprehensive Benchmark

327

07 Apr 2025

Convergence Analysis of Federated Learning Methods Using Backward Error AnalysisAAAI Conference on Artificial Intelligence (AAAI), 2025

270

05 Mar 2025

Tackling Feature and Sample Heterogeneity in Decentralized Multi-Task Learning: A Sheaf-Theoretic Approach

Chaouki Ben Issaid

Praneeth Vepakomma

Mehdi Bennis

483

03 Feb 2025

FedSat: A Statistical Aggregation Approach for Class Imbalanced Clients in Federated Learning

S. Chowdhury

Raju Halder

FedML

259

31 Dec 2024

A Unified Analysis of Federated Learning with Arbitrary Client ParticipationNeural Information Processing Systems (NeurIPS), 2022

Maroun Touma

Mingyue Ji

FedML

621

31 Dec 2024

EDiT: A Local-SGD-Based Efficient Distributed Training Method for Large Language ModelsInternational Conference on Learning Representations (ICLR), 2024

431

10 Dec 2024

FedDUAL: A Dual-Strategy with Adaptive Loss and Dynamic Aggregation for Mitigating Data Heterogeneity in Federated Learning

284

05 Dec 2024

Task Arithmetic Through The Lens Of One-Shot Federated Learning

493

27 Nov 2024

Distributed Sign Momentum with Local Steps for Training Transformers

319

26 Nov 2024

Photon: Federated LLM Pre-Training

...

316

05 Nov 2024

Enhancing Federated Learning Convergence with Dynamic Data Queue and Data Entropy-driven Participant SelectionIEEE Internet of Things Journal (IEEE IoT J.), 2024

221

23 Oct 2024

SDP4Bit: Toward 4-bit Communication Quantization in Sharded Data Parallelism for LLM TrainingNeural Information Processing Systems (NeurIPS), 2024

...

274

20 Oct 2024

On the Convergence of (Stochastic) Gradient Descent for Kolmogorov--Arnold Networks

Yihang Gao

Vincent Y. F. Tan

ODL

153

10 Oct 2024

DEPT: Decoupled Embeddings for Pre-training Language ModelsInternational Conference on Learning Representations (ICLR), 2024

William F. Shen

Dongqi Cai

Nicholas D. Lane

1.4K

07 Oct 2024

Can We Theoretically Quantify the Impacts of Local Updates on the Generalization Performance of Federated Learning?ACM Interational Symposium on Mobile Ad Hoc Networking and Computing (MobiHoc), 2024

Jia Liu

300

05 Sep 2024

FADAS: Towards Federated Adaptive Asynchronous Optimization

217

25 Jul 2024

A New Theoretical Perspective on Data Heterogeneity in Federated Optimization

220

22 Jul 2024

Personalized Multi-tier Federated Learning

242

19 Jul 2024

On the Trade-off between Flatness and Optimization in Distributed Learning

468

28 Jun 2024

Communication-Efficient Adaptive Batch Size Strategies for Distributed Local Gradient Methods

305

20 Jun 2024

Batch-in-Batch: a new adversarial training framework for initial perturbation and sample selection

Le Li

238

06 Jun 2024

Communication-Efficient Distributed Deep Learning via Federated Dynamic Averaging

Antonios Deligiannakis

FedML

438

31 May 2024

Full-Stack Allreduce on Multi-Rail Networks

207

28 May 2024

WASH: Train your Ensemble with Communication-Efficient Weight Shuffling, then Average

333

27 May 2024

Client2Vec: Improving Federated Learning by Distribution Shifts Aware Client Indexing

369

25 May 2024

Efficiency for Free: Ideal Data Are Transportable RepresentationsNeural Information Processing Systems (NeurIPS), 2024

Peng Sun

Yi Jiang

Tao Lin

378

23 May 2024

Worldwide Federated Training of Language Models

351

23 May 2024

The Limits and Potentials of Local SGD for Distributed Heterogeneous Learning with Intermittent Communication

266

19 May 2024

The Future of Large Language Model Pre-training is Federated

...

438

17 May 2024

AB-Training: A Communication-Efficient Approach for Distributed Low-Rank Learning

329

02 May 2024

Improved Generalization Bounds for Communication Efficient Federated Learning

Peyman Gholami

H. Seferoglu

FedML AI4CE

367

17 Apr 2024

Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey

Xiping Hu

347

09 Apr 2024

AdaptSFL: Adaptive Split Federated Learning in Resource-constrained Edge Networks

532

19 Mar 2024

On the Convergence of Federated Learning Algorithms without Data Similarity

314

29 Feb 2024

Training Neural Networks from Scratch with Parallel Low-Rank Adapters

270

26 Feb 2024

CO2: Efficient Distributed Training with Full Communication-Computation Overlap

Zhen Qin

287

29 Jan 2024