v1v2v3 (latest)

Horovod: fast and easy distributed deep learning in TensorFlow

15 February 2018

Alexander Sergeev

Mike Del Balso

ArXiv (abs)PDF HTML Github (14494★)

Papers citing "Horovod: fast and easy distributed deep learning in TensorFlow"

50 / 473 papers shown

Dark Energy Survey Year 3 results: Simulation-based

w

CDM inference from weak lensing and galaxy clustering maps with deep learning: Analysis design

...

305

06 Nov 2025

Dynamic SBI: Round-free Sequential Simulation-Based Inference with Adaptive Datasets

15 Oct 2025

A Unified Framework for Lifted Training and Inversion Approaches

195

10 Oct 2025

DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems

Yuanjun Dai

Keqiang He

An Wang

161

09 Oct 2025

MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates

181

06 Oct 2025

AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models

159

28 Sep 2025

InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training

285

25 Sep 2025

RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training

...

312

25 Sep 2025

OmniFed: A Modular Framework for Configurable Federated Learning from Edge to HPC

266

23 Sep 2025

A Flow-rate-conserving CNN-based Domain Decomposition Method for Blood Flow Simulations

196

19 Sep 2025

TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference

289

21 Aug 2025

WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library

...

248

11 Aug 2025

Tesserae: Scalable Placement Policies for Deep Learning Workloads

S. Bian

Saurabh Agarwal

Md. Tareq Mahmood

Shivaram Venkataraman

234

07 Aug 2025

G-Core: A Simple, Scalable and Balanced RLHF Trainer

...

262

30 Jul 2025

LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems

185

28 Jul 2025

Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit

Junqi Yin

Mijanur Palash

M. Paul Laiu

Muralikrishnan Gopalakrishnan Meena

234

22 Jul 2025

On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning

399

09 Jul 2025

DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models

...

508

28 May 2025

OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training

...

413

14 Apr 2025

Ferret: An Efficient Online Continual Learning Framework under Varying Memory ConstraintsComputer Vision and Pattern Recognition (CVPR), 2025

338

15 Mar 2025

Weak Supervision for Improved Precision in Search Systems

Sriram Vasudevan

NoLa

242

10 Mar 2025

ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUsConference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), 2025

358

28 Feb 2025

Scalable Higher Resolution Polar Sea Ice Classification and Freeboard Calculation from ICESat-2 ATL03 DataIEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPS), 2025

443

04 Feb 2025

Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU ClustersIEEE Conference on Computer Communications (IEEE INFOCOM), 2025

231

09 Jan 2025

Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution

438

24 Nov 2024

Photon: Federated LLM Pre-Training

...

359

05 Nov 2024

Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer ModelsInternational Conference on Supercomputing (ICS), 2024

348

01 Nov 2024

A Novel Breast Ultrasound Image Augmentation Method Using Advanced Neural Style Transfer: An Efficient and Explainable Approach

233

31 Oct 2024

Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved OffloadingInternational Middleware Conference (Middleware), 2024

217

26 Oct 2024

Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization

425

17 Oct 2024

From promise to practice: realizing high-performance decentralized trainingInternational Conference on Learning Representations (ICLR), 2024

370

15 Oct 2024

Breaking the mold: The challenge of large scale MARL specialization

Ruochen Liu

Elvis Liu

224

03 Oct 2024

HybridFlow: A Flexible and Efficient RLHF FrameworkEuropean Conference on Computer Systems (EuroSys), 2024

Wang Zhang

Haibin Lin

853

1,451

28 Sep 2024

Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping

Ang Li

216

23 Sep 2024

Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML

281

19 Sep 2024

Revisiting the Time Cost Model of AllReduce

Dan Li

143

06 Sep 2024

Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge DevicesACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2024

Xu Chen

340

15 Aug 2024

Efficient Training of Large Language Models on Distributed Infrastructures: A Survey

...

Dahua Lin

Yonggang Wen

Xin Jin

Tianwei Zhang

Yang Liu

401

29 Jul 2024

On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers

Tao Li

261

02 Jul 2024

Hybrid Approach to Parallel Stochastic Gradient Descent

Aakash Sudhirbhai Vora

Dhrumil Chetankumar Joshi

Aksh Kantibhai Patel

111

27 Jun 2024

Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars

Mallikarjun Shankar

210

24 Jun 2024

AI-coupled HPC Workflow Applications, Middleware and Performance

405

20 Jun 2024

SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver

Kishansingh Rajput

201

11 Jun 2024

Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training

205

08 Jun 2024

Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers

Bogdan Nicolae

237

05 Jun 2024

Full-Stack Allreduce on Multi-Rail Networks

253

28 May 2024

Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference

Xu Chen

245

27 May 2024

HetHub: A Heterogeneous distributed hybrid training system for large-scale models

Shengen Yan

...

103

25 May 2024

Apply Distributed CNN on Genomics to accelerate Transcription-Factor TAL1 Motif Prediction

Tasnim Assali

Zayneb Trabelsi Ayoub

Sofiane Ouni

GNN AI4CE

25 May 2024

Worldwide Federated Training of Language Models

458

23 May 2024