ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.01346
  4. Cited By
HammingMesh: A Network Topology for Large-Scale Deep Learning

HammingMesh: A Network Topology for Large-Scale Deep Learning

3 September 2022
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
    3DH
    GNN
    AI4CE
ArXivPDFHTML

Papers citing "HammingMesh: A Network Topology for Large-Scale Deep Learning"

13 / 13 papers shown
Title
Mixtera: A Data Plane for Foundation Model Training
Mixtera: A Data Plane for Foundation Model Training
Maximilian Böther
Xiaozhe Yao
Tolga Kerimoglu
Ana Klimovic
Viktor Gsteiger
Ana Klimovic
MoE
101
0
0
27 Feb 2025
Efficient Training of Large Language Models on Distributed
  Infrastructures: A Survey
Efficient Training of Large Language Models on Distributed Infrastructures: A Survey
Jiangfei Duan
Shuo Zhang
Zerui Wang
Lijuan Jiang
Wenwen Qu
...
Dahua Lin
Yonggang Wen
Xin Jin
Tianwei Zhang
Peng Sun
71
8
0
29 Jul 2024
Near-Optimal Wafer-Scale Reduce
Near-Optimal Wafer-Scale Reduce
Piotr Luczynski
Lukas Gianinazzi
Patrick Iff
Leighton Wilson
Daniele De Sensi
Torsten Hoefler
30
2
0
24 Apr 2024
LLAMP: Assessing Network Latency Tolerance of HPC Applications with
  Linear Programming
LLAMP: Assessing Network Latency Tolerance of HPC Applications with Linear Programming
Siyuan Shen
Langwen Huang
Marcin Chrapek
Timo Schneider
Jai Dayal
Manisha Gajbe
Robert Wisniewski
Torsten Hoefler
30
1
0
22 Apr 2024
Communication-Efficient Large-Scale Distributed Deep Learning: A
  Comprehensive Survey
Communication-Efficient Large-Scale Distributed Deep Learning: A Comprehensive Survey
Feng Liang
Zhen Zhang
Haifeng Lu
Victor C. M. Leung
Yanyi Guo
Xiping Hu
GNN
34
6
0
09 Apr 2024
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Swing: Short-cutting Rings for Higher Bandwidth Allreduce
Daniele De Sensi
Tommaso Bonato
D. Saam
Torsten Hoefler
25
6
0
17 Jan 2024
Datacenter Ethernet and RDMA: Issues at Hyperscale
Datacenter Ethernet and RDMA: Issues at Hyperscale
Torsten Hoefler
Duncan Roweth
K. Underwood
Bob Alverson
Mark Griswold
...
Surendra Anubolu
Siyan Shen
A. Kabbani
M. McLaren
Steve Scott
19
8
0
07 Feb 2023
ATP: Adaptive Tensor Parallelism for Foundation Models
ATP: Adaptive Tensor Parallelism for Foundation Models
Shenggan Cheng
Ziming Liu
Jiangsu Du
Yang You
21
6
0
20 Jan 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth
  Cost
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
23
2
0
17 Jan 2023
Chimera: Efficiently Training Large-Scale Neural Networks with
  Bidirectional Pipelines
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
77
131
0
14 Jul 2021
Sparsity in Deep Learning: Pruning and growth for efficient inference
  and training in neural networks
Sparsity in Deep Learning: Pruning and growth for efficient inference and training in neural networks
Torsten Hoefler
Dan Alistarh
Tal Ben-Nun
Nikoli Dryden
Alexandra Peste
MQ
141
684
0
31 Jan 2021
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
231
4,469
0
23 Jan 2020
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,817
0
17 Sep 2019
1