All Papers

DiLoCo: Distributed Low-Communication Training of Language Models

417

30 Nov 2023

307

14 Nov 2023

A Quadratic Synchronization Rule for Distributed Deep LearningInternational Conference on Learning Representations (ICLR), 2023

297

22 Oct 2023

Federated Multi-Objective Learning

Haibo Yang

292

15 Oct 2023

Enhancing Clustered Federated Learning: Integration of Strategies and Improved MethodologiesInternational Conference on Learning Representations (ICLR), 2023

111

09 Oct 2023

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

Yunwen Lei

Tao Sun

Mingrui Liu

453

02 Oct 2023

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Li Shen

195

18 Sep 2023

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problemsScientific Reports (Sci Rep), 2023

Maria Chiara Angelini

A. Cavaliere

Raffaele Marino

F. Ricci-Tersenghi

340

11 Sep 2023

Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to BeatIEEE International Conference on Computer Vision (ICCV), 2023

Erdong Hu

Yu-Shuen Tang

Anastasios Kyrillidis

C. Jermaine

285

06 Sep 2023

Stochastic Controlled Averaging for Federated Learning with Communication CompressionInternational Conference on Learning Representations (ICLR), 2023

Xinmeng Huang

Ping Li

Xiaoyun Li

387

248

16 Aug 2023

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear SpeedupIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Li Shen

Liang Ding

160

30 Jul 2023

DIGEST: Fast and Communication Efficient Decentralized Learning with Local UpdatesIEEE Transactions on Machine Learning in Communications and Networking (IEEE TMLCN), 2023

Peyman Gholami

H. Seferoglu

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

244

14 Jul 2023

Momentum Benefits Non-IID Federated Learning Simply and ProvablyInternational Conference on Learning Representations (ICLR), 2023

551

28 Jun 2023

Le‐Yu Chen

Yaohua Ma

J.N. Zhang

430

26 Jun 2023

DropCompute: simple and more robust distributed synchronous training via compute variance reductionNeural Information Processing Systems (NeurIPS), 2023

346

18 Jun 2023

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

269

14 Jun 2023

$$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning$

\textbf{A}^2\textbf{CiD}^2

: Accelerating Asynchronous Communication in Decentralized Deep LearningNeural Information Processing Systems (NeurIPS), 2023

Adel Nabli

Eugene Belilovsky

Edouard Oyallon

338

14 Jun 2023

On the Computation-Communication Trade-Off with A Flexible Gradient Tracking ApproachIEEE Conference on Decision and Control (CDC), 2023

Yan Huang

Jinming Xu

219

12 Jun 2023

Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed InitializationNeural Information Processing Systems (NeurIPS), 2023

Yan Sun

Li Shen

Dacheng Tao

210

09 Jun 2023

A Lightweight Method for Tackling Unknown Participation Statistics in Federated AveragingInternational Conference on Learning Representations (ICLR), 2023

Maroun Touma

Mingyue Ji

Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution

321

06 Jun 2023

Jinwuk Seok

Chang-Jae Cho

270

30 May 2023

FAVANO: Federated AVeraging with Asynchronous NOdesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

277

25 May 2023

Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

Linxuan Pan

Shenghui Song

WW-FL: Secure and Private Large-Scale Federated Learning

130

24 May 2023

Loss Spike in Training Neural NetworksJournal of Computational Mathematics (JCM), 2023

Zhongwang Zhang

Z. Xu

212

20 May 2023

Faster Federated Learning with Decaying Number of Local SGD StepsIEEE Transactions on Parallel and Distributed Systems (TPDS), 2023

160

16 May 2023

Hierarchical Weight Averaging for Deep Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

Zixun Zhang

136

23 Apr 2023

383

20 Feb 2023

Similarity, Compression and Local Steps: Three Pillars of Efficient
Communications for Distributed Variational Inequalities

Aleksandr Beznosikov

Martin Takáč

Alexander Gasnikov

292

15 Feb 2023

EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous DataInternational Conference on Learning Representations (ICLR), 2023

189

14 Feb 2023

Delay Sensitive Hierarchical Federated Learning with Stochastic Local UpdatesIEEE Transactions on Cognitive Communications and Networking (IEEE TCCN), 2023

Abdulmoneam Ali

A. Arafa

Federated Learning with Regularized Client Participation

257

09 Feb 2023

Grigory Malinovsky

Samuel Horváth

260

07 Feb 2023

FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust ClusteringInternational Conference on Machine Learning (ICML), 2023

346

29 Jan 2023

SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device InferenceEuropean Conference on Computer Vision (ECCV), 2023

202

26 Jan 2023

Optimus-CC: Efficient Large NLP Model Training with 3D Parallelism Aware Communication CompressionInternational Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2023

266

24 Jan 2023

Decentralized Gradient Tracking with Local StepsOptimization Methods and Software (OMS), 2023

251

03 Jan 2023

Federated Learning with Flexible ControlIEEE Conference on Computer Communications (INFOCOM), 2022

218

16 Dec 2022

FedFA: Federated Learning with Feature Anchors to Align Features and Classifiers for Heterogeneous DataIEEE Transactions on Mobile Computing (IEEE TMC), 2022

387

17 Nov 2022

Compute-Efficient Deep Learning: Algorithmic Trends and OpportunitiesJournal of machine learning research (JMLR), 2022

Brian Bartoldson

B. Kailkhura

Davis W. Blalock

300

13 Oct 2022

On the Performance of Gradient Tracking with Local UpdatesIEEE Conference on Decision and Control (CDC), 2022

Edward Duc Hien Nguyen

Sulaiman A. Alghunaim

Kun Yuan

César A. Uribe

235

10 Oct 2022

Scaling up Stochastic Gradient Descent for Non-convex OptimisationMachine-mediated learning (ML), 2022

S. Mohamad

H. Alamri

A. Bouchachia

208

06 Oct 2022

STSyn: Speeding Up Local SGD with Straggler-Tolerant SynchronizationIEEE Transactions on Signal Processing (IEEE Trans. Signal Process.), 2022

Feng Zhu

Jingjing Zhang

Xin Eric Wang

310

06 Oct 2022

Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated LearningNeural Information Processing Systems (NeurIPS), 2022

Haibo Yang

Pei-Yuan Qiu

Jia Liu

Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

343

03 Oct 2022

Ming Xiang

Lili Su

Communication Complexity in Federated Min-Max Learning

162

03 Oct 2022

$SAGDA: Achieving $\mathcal{O}(ε^{-2})$ Communication Complexity in Federated Min-Max Learning$

SAGDA: Achieving

\mathcal{O}(ε^{-2})

Haibo Yang

255

02 Oct 2022

Personalized Federated Learning with Communication Compression

El Houcine Bergou

Aritra Dutta

Momentum-SAM: Sharpness Aware Minimization without Computational Overhead

230

12 Sep 2022

Flexible Vertical Federated Learning with Heterogeneous PartiesIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2022

380

26 Aug 2022

v1v2v3v4v5v6 (latest)

Don't Use Large Mini-Batches, Use Local SGD

22 August 2018

Papers citing "Don't Use Large Mini-Batches, Use Local SGD"

50 / 280 papers shown

Marlon Becker

Frederick Altrock

Benjamin Risse

490

22 Jan 2024

Asynchronous Local-SGD Training for Language Modeling

270

17 Jan 2024

On the Role of Server Momentum in Federated Learning

241

19 Dec 2023

Meta-learning Optimizers for Communication-Efficient Learning

Charles-Étienne Joseph

379

02 Dec 2023

Communication-Efficient Heterogeneous Federated Learning with Generalized Heavy-Ball Momentum

Riccardo Zaccone

Sai Praneeth Karimireddy

Carlo Masone

Marco Ciccone

DiLoCo: Distributed Low-Communication Training of Language Models

417

30 Nov 2023

307

14 Nov 2023

A Quadratic Synchronization Rule for Distributed Deep LearningInternational Conference on Learning Representations (ICLR), 2023

297

22 Oct 2023

Federated Multi-Objective Learning

Haibo Yang

292

15 Oct 2023

Enhancing Clustered Federated Learning: Integration of Strategies and Improved MethodologiesInternational Conference on Learning Representations (ICLR), 2023

111

09 Oct 2023

Minibatch and Local SGD: Algorithmic Stability and Linear Speedup in Generalization

Yunwen Lei

Tao Sun

Mingrui Liu

453

02 Oct 2023

FedLALR: Client-Specific Adaptive Learning Rates Achieve Linear Speedup for Non-IID Data

Li Shen

195

18 Sep 2023

Stochastic Gradient Descent-like relaxation is equivalent to Metropolis dynamics in discrete optimization and inference problemsScientific Reports (Sci Rep), 2023

Maria Chiara Angelini

A. Cavaliere

Raffaele Marino

F. Ricci-Tersenghi

340

11 Sep 2023

Federated Learning Over Images: Vertical Decompositions and Pre-Trained Backbones Are Difficult to BeatIEEE International Conference on Computer Vision (ICCV), 2023

Erdong Hu

Yu-Shuen Tang

Anastasios Kyrillidis

C. Jermaine

285

06 Sep 2023

Stochastic Controlled Averaging for Federated Learning with Communication CompressionInternational Conference on Learning Representations (ICLR), 2023

Xinmeng Huang

Ping Li

Xiaoyun Li

387

248

16 Aug 2023

Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear SpeedupIEEE Transactions on Pattern Analysis and Machine Intelligence (TPAMI), 2023

Li Shen

Liang Ding

160

30 Jul 2023

DIGEST: Fast and Communication Efficient Decentralized Learning with Local UpdatesIEEE Transactions on Machine Learning in Communications and Networking (IEEE TMLCN), 2023

Peyman Gholami

H. Seferoglu

Near-Optimal Nonconvex-Strongly-Convex Bilevel Optimization with Fully First-Order Oracles

244

14 Jul 2023

Momentum Benefits Non-IID Federated Learning Simply and ProvablyInternational Conference on Learning Representations (ICLR), 2023

551

28 Jun 2023

Le‐Yu Chen

Yaohua Ma

J.N. Zhang

430

26 Jun 2023

DropCompute: simple and more robust distributed synchronous training via compute variance reductionNeural Information Processing Systems (NeurIPS), 2023

346

18 Jun 2023

Batches Stabilize the Minimum Norm Risk in High Dimensional Overparameterized Linear Regression

269

14 Jun 2023

$$\textbf{A}^2\textbf{CiD}^2$: Accelerating Asynchronous Communication in Decentralized Deep Learning$

\textbf{A}^2\textbf{CiD}^2

: Accelerating Asynchronous Communication in Decentralized Deep LearningNeural Information Processing Systems (NeurIPS), 2023

Adel Nabli

Eugene Belilovsky

Edouard Oyallon

338

14 Jun 2023

On the Computation-Communication Trade-Off with A Flexible Gradient Tracking ApproachIEEE Conference on Decision and Control (CDC), 2023

Yan Huang

Jinming Xu

219

12 Jun 2023

Understanding How Consistency Works in Federated Learning via Stage-wise Relaxed InitializationNeural Information Processing Systems (NeurIPS), 2023

Yan Sun

Li Shen

Dacheng Tao

210

09 Jun 2023

A Lightweight Method for Tackling Unknown Participation Statistics in Federated AveragingInternational Conference on Learning Representations (ICLR), 2023

Maroun Touma

Mingyue Ji

Stochastic Gradient Langevin Dynamics Based on Quantization with Increasing Resolution

321

06 Jun 2023

Jinwuk Seok

Chang-Jae Cho

270

30 May 2023

FAVANO: Federated AVeraging with Asynchronous NOdesIEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP), 2023

277

25 May 2023

Local SGD Accelerates Convergence by Exploiting Second Order Information of the Loss Function

Linxuan Pan

Shenghui Song

WW-FL: Secure and Private Large-Scale Federated Learning

130

24 May 2023

Loss Spike in Training Neural NetworksJournal of Computational Mathematics (JCM), 2023

Zhongwang Zhang

Z. Xu

212

20 May 2023

Faster Federated Learning with Decaying Number of Local SGD StepsIEEE Transactions on Parallel and Distributed Systems (TPDS), 2023

160

16 May 2023

Hierarchical Weight Averaging for Deep Neural NetworksIEEE Transactions on Neural Networks and Learning Systems (TNNLS), 2023

Zixun Zhang

136

23 Apr 2023

383

20 Feb 2023

Aleksandr Beznosikov

Martin Takáč

Alexander Gasnikov

292

15 Feb 2023

EPISODE: Episodic Gradient Clipping with Periodic Resampled Corrections for Federated Learning with Heterogeneous DataInternational Conference on Learning Representations (ICLR), 2023

189

14 Feb 2023

Delay Sensitive Hierarchical Federated Learning with Stochastic Local UpdatesIEEE Transactions on Cognitive Communications and Networking (IEEE TCCN), 2023

Abdulmoneam Ali

A. Arafa

Federated Learning with Regularized Client Participation

257

09 Feb 2023

Grigory Malinovsky

Samuel Horváth

260

07 Feb 2023

FedRC: Tackling Diverse Distribution Shifts Challenge in Federated Learning by Robust ClusteringInternational Conference on Machine Learning (ICML), 2023

346

29 Jan 2023

SuperFedNAS: Cost-Efficient Federated Neural Architecture Search for On-Device InferenceEuropean Conference on Computer Vision (ECCV), 2023

202

26 Jan 2023

266

24 Jan 2023

Decentralized Gradient Tracking with Local StepsOptimization Methods and Software (OMS), 2023

251

03 Jan 2023

Federated Learning with Flexible ControlIEEE Conference on Computer Communications (INFOCOM), 2022

218

16 Dec 2022

FedFA: Federated Learning with Feature Anchors to Align Features and Classifiers for Heterogeneous DataIEEE Transactions on Mobile Computing (IEEE TMC), 2022

387

17 Nov 2022

Compute-Efficient Deep Learning: Algorithmic Trends and OpportunitiesJournal of machine learning research (JMLR), 2022

Brian Bartoldson

B. Kailkhura

Davis W. Blalock

300

13 Oct 2022

On the Performance of Gradient Tracking with Local UpdatesIEEE Conference on Decision and Control (CDC), 2022

Edward Duc Hien Nguyen

Sulaiman A. Alghunaim

Kun Yuan

César A. Uribe

235

10 Oct 2022

Scaling up Stochastic Gradient Descent for Non-convex OptimisationMachine-mediated learning (ML), 2022

S. Mohamad

H. Alamri

A. Bouchachia

208

06 Oct 2022

STSyn: Speeding Up Local SGD with Straggler-Tolerant SynchronizationIEEE Transactions on Signal Processing (IEEE Trans. Signal Process.), 2022

Feng Zhu

Jingjing Zhang

Xin Eric Wang

310

06 Oct 2022

Taming Fat-Tailed ("Heavier-Tailed'' with Potentially Infinite Variance) Noise in Federated LearningNeural Information Processing Systems (NeurIPS), 2022

Haibo Yang

Pei-Yuan Qiu

Jia Liu

Distributed Non-Convex Optimization with One-Bit Compressors on Heterogeneous Data: Efficient and Resilient Algorithms

343

03 Oct 2022

Ming Xiang

Lili Su

Communication Complexity in Federated Min-Max Learning

162

03 Oct 2022

$SAGDA: Achieving $\mathcal{O}(ε^{-2})$ Communication Complexity in Federated Min-Max Learning$

SAGDA: Achieving

\mathcal{O}(ε^{-2})

Haibo Yang

255

02 Oct 2022

Personalized Federated Learning with Communication Compression

El Houcine Bergou

Aritra Dutta