ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1802.05799
  4. Cited By
Horovod: fast and easy distributed deep learning in TensorFlow
v1v2v3 (latest)

Horovod: fast and easy distributed deep learning in TensorFlow

15 February 2018
Alexander Sergeev
Mike Del Balso
ArXiv (abs)PDFHTMLGithub (14494★)

Papers citing "Horovod: fast and easy distributed deep learning in TensorFlow"

50 / 473 papers shown
Dark Energy Survey Year 3 results: Simulation-based $w$CDM inference from weak lensing and galaxy clustering maps with deep learning: Analysis design
Dark Energy Survey Year 3 results: Simulation-based wwwCDM inference from weak lensing and galaxy clustering maps with deep learning: Analysis design
A. Thomsen
J. Bucko
T. Kacprzak
V. Ajani
J. Fluri
...
Miles W. E. Smith
E. Suchyta
M. E. C. Swanson
D. Thomas
C. To
305
0
0
06 Nov 2025
Dynamic SBI: Round-free Sequential Simulation-Based Inference with Adaptive Datasets
Dynamic SBI: Round-free Sequential Simulation-Based Inference with Adaptive Datasets
Huifang Lyu
James Alvey
Noemi Anau Montel
Mauro Pieroni
Christoph Weniger
45
1
0
15 Oct 2025
A Unified Framework for Lifted Training and Inversion Approaches
A Unified Framework for Lifted Training and Inversion Approaches
Xiaoyu Wang
Alexandra Valavanis
Azhir Mahmood
Andreas Mang
Martin Benning
Audrey Repetti
195
0
0
10 Oct 2025
DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
DYNAMIX: RL-based Adaptive Batch Size Optimization in Distributed Machine Learning Systems
Yuanjun Dai
Keqiang He
An Wang
161
0
0
09 Oct 2025
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
MT-DAO: Multi-Timescale Distributed Adaptive Optimizers with Local Updates
Alex Iacob
Andrej Jovanovic
M. Safaryan
Meghdad Kurmanji
Lorenzo Sani
Samuel Horváth
William F. Shen
Xinchi Qiu
Nicholas D. Lane
AI4CE
181
1
0
06 Oct 2025
AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models
AdaPtis: Reducing Pipeline Bubbles with Adaptive Pipeline Parallelism on Heterogeneous Models
Jihu Guo
Tenghui Ma
Wei Gao
Peng Sun
Jiaxing Li
Xun Chen
Yuyang Jin
Dahua Lin
159
1
0
28 Sep 2025
InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training
InfiniPipe: Elastic Pipeline Parallelism for Efficient Variable-Length Long-Context LLM Training
Shiju Wang
Yujie Wang
Ao Sun
Fangcheng Fu
Z. Zhu
Huang Leng
Xu Han
Kaisheng Ma
MoE
285
0
0
25 Sep 2025
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
RollPacker: Mitigating Long-Tail Rollouts for Fast, Synchronous RL Post-Training
Wei Gao
Yuheng Zhao
Dakai An
Tianyuan Wu
Lunxi Cao
...
Yuchi Xu
Jiamang Wang
Lin Qu
B. Zheng
Wei Wang
OffRLVLM
312
19
0
25 Sep 2025
OmniFed: A Modular Framework for Configurable Federated Learning from Edge to HPC
OmniFed: A Modular Framework for Configurable Federated Learning from Edge to HPC
Sahil Tyagi
Andrei Cozma
Olivera Kotevska
Feiyi Wang
FedML
266
3
0
23 Sep 2025
A Flow-rate-conserving CNN-based Domain Decomposition Method for Blood Flow Simulations
A Flow-rate-conserving CNN-based Domain Decomposition Method for Blood Flow Simulations
Simon Klaes
A. Klawonn
Natalie Kubicki
M. Lanser
Kengo Nakajima
Takashi Shimokawabe
J. Weber
196
0
0
19 Sep 2025
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference
TPLA: Tensor Parallel Latent Attention for Efficient Disaggregated Prefill and Decode Inference
Xiaojuan Tang
Fanxu Meng
Pingzhi Tang
Yuxuan Wang
Di Yin
Xing Sun
M. Zhang
289
1
0
21 Aug 2025
WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
WeChat-YATT: A Scalable, Simple, Efficient, and Production Ready Training Library
Junyu Wu
Weiming Chang
Xiaotao Liu
Guanyou He
Tingfeng Xian
...
Tao Yang
Yunsheng Shi
Feng Lin
Ting Yao
Jiatao Xu
OffRL
248
0
0
11 Aug 2025
Tesserae: Scalable Placement Policies for Deep Learning Workloads
Tesserae: Scalable Placement Policies for Deep Learning Workloads
S. Bian
Saurabh Agarwal
Md. Tareq Mahmood
Shivaram Venkataraman
234
0
0
07 Aug 2025
G-Core: A Simple, Scalable and Balanced RLHF Trainer
G-Core: A Simple, Scalable and Balanced RLHF Trainer
Junyu Wu
Weiming Chang
Xiaotao Liu
Guanyou He
Haoqiang Hong
...
Hongtao Tian
Tao Yang
Yunsheng Shi
Feng Lin
Ting Yao
OffRLALM
262
2
0
30 Jul 2025
LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems
LeMix: Unified Scheduling for LLM Training and Inference on Multi-GPU Systems
Yufei Li
Zexin Li
Yinglun Zhu
Cong Liu
185
2
0
28 Jul 2025
Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit
Pixel-Resolved Long-Context Learning for Turbulence at Exascale: Resolving Small-scale Eddies Toward the Viscous Limit
Junqi Yin
Mijanur Palash
M. Paul Laiu
Muralikrishnan Gopalakrishnan Meena
Ravi Tandon
S. D. B. Kops
Feiyi Wang
Ramanan Sankaran
Pei Zhang
234
1
0
22 Jul 2025
On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning
On the Surprising Effectiveness of a Single Global Merging in Decentralized Learning
Tongtian Zhu
Tianyu Zhang
Mingze Wang
Zhanpeng Zhou
Can Wang
FedML
399
0
0
09 Jul 2025
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
DES-LOC: Desynced Low Communication Adaptive Optimizers for Training Foundation Models
Alex Iacob
Lorenzo Sani
M. Safaryan
Paris Giampouras
Samuel Horváth
...
Meghdad Kurmanji
Preslav Aleksandrov
William F. Shen
Xinchi Qiu
Nicholas D. Lane
OffRL
508
2
0
28 May 2025
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
OVERLORD: Ultimate Scaling of DataLoader for Multi-Source Large Foundation Model Training
Juntao Zhao
Qi Lu
Wei Jia
Borui Wan
Lei Zuo
...
Size Zheng
Yanghua Peng
H. Lin
Xin Liu
Chuan Wu
AI4CE
413
1
0
14 Apr 2025
Ferret: An Efficient Online Continual Learning Framework under Varying Memory Constraints
Ferret: An Efficient Online Continual Learning Framework under Varying Memory ConstraintsComputer Vision and Pattern Recognition (CVPR), 2025
Yuhao Zhou
Yuxin Tian
Jindi Lv
Mingjia Shi
Yuanxi Li
Qing Ye
Shuhao Zhang
Jiancheng Lv
CLL
338
3
0
15 Mar 2025
Weak Supervision for Improved Precision in Search Systems
Weak Supervision for Improved Precision in Search Systems
Sriram Vasudevan
NoLa
242
0
0
10 Mar 2025
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUs
ByteScale: Efficient Scaling of LLM Training with a 2048K Context Length on More Than 12,000 GPUsConference on Applications, Technologies, Architectures, and Protocols for Computer Communication (SIGCOMM), 2025
Hao Ge
Junda Feng
Qi Huang
Fangcheng Fu
Xiaonan Nie
Lei Zuo
Yanghua Peng
Tengjiao Wang
Xin Liu
358
7
0
28 Feb 2025
Scalable Higher Resolution Polar Sea Ice Classification and Freeboard Calculation from ICESat-2 ATL03 Data
Scalable Higher Resolution Polar Sea Ice Classification and Freeboard Calculation from ICESat-2 ATL03 DataIEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPS), 2025
Jurdana Masuma Iqrah
YoungHyun Koo
Wei Wang
H. Xie
Sushil Prasad
AI4Cl
443
2
0
04 Feb 2025
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU Clusters
Prediction-Assisted Online Distributed Deep Learning Workload Scheduling in GPU ClustersIEEE Conference on Computer Communications (IEEE INFOCOM), 2025
Ziyue Luo
Jia-Wei Liu
Myungjin Lee
Ness B. Shroff
231
2
0
09 Jan 2025
Hiding Communication Cost in Distributed LLM Training via Micro-batch
  Co-execution
Hiding Communication Cost in Distributed LLM Training via Micro-batch Co-execution
Haiquan Wang
Chaoyi Ruan
Jia He
Jiaqi Ruan
Chengjie Tang
Xiaosong Ma
Cheng-rong Li
438
2
0
24 Nov 2024
Photon: Federated LLM Pre-Training
Photon: Federated LLM Pre-Training
Lorenzo Sani
Alex Iacob
Zeyu Cao
Royson Lee
Bill Marino
...
Dongqi Cai
Zexi Li
Wanru Zhao
Xinchi Qiu
Nicholas D. Lane
AI4CE
359
22
0
05 Nov 2024
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer
  Models
Cephalo: Harnessing Heterogeneous GPU Clusters for Training Transformer ModelsInternational Conference on Supercomputing (ICS), 2024
Runsheng Benson Guo
Utkarsh Anand
Arthur Chen
Khuzaima Daudjee
348
5
0
01 Nov 2024
A Novel Breast Ultrasound Image Augmentation Method Using Advanced
  Neural Style Transfer: An Efficient and Explainable Approach
A Novel Breast Ultrasound Image Augmentation Method Using Advanced Neural Style Transfer: An Efficient and Explainable Approach
Lipismita Panigrahi
Prianka Rani Saha
Jurdana Masuma Iqrah
Sushil Prasad
MedIm
233
0
0
31 Oct 2024
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved Offloading
Deep Optimizer States: Towards Scalable Training of Transformer Models Using Interleaved OffloadingInternational Middleware Conference (Middleware), 2024
Avinash Maurya
Jie Ye
M. Rafique
Franck Cappello
Bogdan Nicolae
MoE
217
7
0
26 Oct 2024
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale
  Models via Malleable Data and Model Parallelization
Malleus: Straggler-Resilient Hybrid Parallel Training of Large-scale Models via Malleable Data and Model Parallelization
Haoyang Li
Fangcheng Fu
Hao Ge
Sheng Lin
Xuanyu Wang
Jiawen Niu
Yijiao Wang
Hailin Zhang
Xiaonan Nie
Tengjiao Wang
MoMe
425
11
0
17 Oct 2024
From promise to practice: realizing high-performance decentralized
  training
From promise to practice: realizing high-performance decentralized trainingInternational Conference on Learning Representations (ICLR), 2024
Zesen Wang
Jiaojiao Zhang
Xuyang Wu
M. Johansson
370
4
0
15 Oct 2024
Breaking the mold: The challenge of large scale MARL specialization
Breaking the mold: The challenge of large scale MARL specialization
Stefan Juang
Hugh Cao
Arielle Zhou
Ruochen Liu
Nevin L. Zhang
Elvis Liu
224
1
0
03 Oct 2024
HybridFlow: A Flexible and Efficient RLHF Framework
HybridFlow: A Flexible and Efficient RLHF FrameworkEuropean Conference on Computer Systems (EuroSys), 2024
Guangming Sheng
Chi Zhang
Zilingfeng Ye
Xibin Wu
Wang Zhang
Ru Zhang
Size Zheng
Haibin Lin
Chuan Wu
AI4CE
853
1,451
0
28 Sep 2024
Domino: Eliminating Communication in LLM Training via Generic Tensor
  Slicing and Overlapping
Domino: Eliminating Communication in LLM Training via Generic Tensor Slicing and Overlapping
Guanhua Wang
Chengming Zhang
Sihan Chen
Ang Li
Olatunji Ruwase
216
18
0
23 Sep 2024
Performance and Power: Systematic Evaluation of AI Workloads on
  Accelerators with CARAML
Performance and Power: Systematic Evaluation of AI Workloads on Accelerators with CARAML
Chelsea Maria John
Stepan Nassyr
Carolin Penke
A. Herten
281
3
0
19 Sep 2024
Revisiting the Time Cost Model of AllReduce
Revisiting the Time Cost Model of AllReduce
Dian Xiong
Li Chen
Youhe Jiang
Dan Li
Shuai Wang
Songtao Wang
143
5
0
06 Sep 2024
Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for
  Collaborative DNN Training on Heterogeneous Edge Devices
Asteroid: Resource-Efficient Hybrid Pipeline Parallelism for Collaborative DNN Training on Heterogeneous Edge DevicesACM/IEEE International Conference on Mobile Computing and Networking (MobiCom), 2024
Shengyuan Ye
Liekang Zeng
Xiaowen Chu
Guoliang Xing
Xu Chen
340
36
0
15 Aug 2024
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Yang Liu
401
48
0
29 Jul 2024
On the Performance and Memory Footprint of Distributed Training: An
  Empirical Study on Transformers
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
Zhengxian Lu
Fangyu Wang
Zhiwei Xu
Fei Yang
Tao Li
261
4
0
02 Jul 2024
Hybrid Approach to Parallel Stochastic Gradient Descent
Hybrid Approach to Parallel Stochastic Gradient Descent
Aakash Sudhirbhai Vora
Dhrumil Chetankumar Joshi
Aksh Kantibhai Patel
111
0
0
27 Jun 2024
Scalable Artificial Intelligence for Science: Perspectives, Methods and
  Exemplars
Scalable Artificial Intelligence for Science: Perspectives, Methods and Exemplars
Wesley Brewer
Aditya Kashi
Sajal Dash
A. Tsaris
Junqi Yin
Mallikarjun Shankar
Feiyi Wang
210
1
0
24 Jun 2024
AI-coupled HPC Workflow Applications, Middleware and Performance
AI-coupled HPC Workflow Applications, Middleware and Performance
Wes Brewer
Ana Gainaru
Frédéric Suter
Feiyi Wang
M. Emani
S. Jha
405
28
0
20 Jun 2024
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
SAGIPS: A Scalable Asynchronous Generative Inverse Problem Solver
Daniel Lersch
Malachi Schram
Zhenyu Dai
Kishansingh Rajput
Xingfu Wu
Nobuo Sato
J. T. Childers
201
1
0
11 Jun 2024
Training Through Failure: Effects of Data Consistency in Parallel
  Machine Learning Training
Training Through Failure: Effects of Data Consistency in Parallel Machine Learning Training
Ray Cao
Sherry Luo
Steve Gan
Sujeeth Jinesh
205
2
0
08 Jun 2024
Efficient Data-Parallel Continual Learning with Asynchronous Distributed
  Rehearsal Buffers
Efficient Data-Parallel Continual Learning with Asynchronous Distributed Rehearsal Buffers
Thomas Bouvier
Bogdan Nicolae
Hugo Chaugier
Alexandru Costan
Ian Foster
Gabriel Antoniu
237
2
0
05 Jun 2024
Full-Stack Allreduce on Multi-Rail Networks
Full-Stack Allreduce on Multi-Rail Networks
Enda Yu
Dezun Dong
Xiangke Liao
GNN
253
1
0
28 May 2024
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ
  Transformer Inference
Galaxy: A Resource-Efficient Collaborative Edge AI System for In-situ Transformer Inference
Shengyuan Ye
Jiangsu Du
Liekang Zeng
Wenzhong Ou
Xiaowen Chu
Yutong Lu
Xu Chen
245
47
0
27 May 2024
HetHub: A Heterogeneous distributed hybrid training system for
  large-scale models
HetHub: A Heterogeneous distributed hybrid training system for large-scale models
Si Xu
Zixiao Huang
Yan Zeng
Shengen Yan
Xuefei Ning
...
Zhezheng Lin
Hao Zhang
Sheng Wang
Guohao Dai
Yu Wang
GNN
103
0
0
25 May 2024
Apply Distributed CNN on Genomics to accelerate Transcription-Factor
  TAL1 Motif Prediction
Apply Distributed CNN on Genomics to accelerate Transcription-Factor TAL1 Motif Prediction
Tasnim Assali
Zayneb Trabelsi Ayoub
Sofiane Ouni
GNNAI4CE
73
2
0
25 May 2024
Worldwide Federated Training of Language Models
Worldwide Federated Training of Language Models
Alexandru Iacob
Lorenzo Sani
Bill Marino
Preslav Aleksandrov
William F. Shen
Nicholas D. Lane
FedML
458
7
0
23 May 2024
1234...8910
Next
Page 1 of 10