ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.13005
  4. Cited By
AxoNN: An asynchronous, message-driven parallel framework for
  extreme-scale deep learning

AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning

25 October 2021
Siddharth Singh
A. Bhatele
    GNN
ArXivPDFHTML

Papers citing "AxoNN: An asynchronous, message-driven parallel framework for extreme-scale deep learning"

10 / 10 papers shown
Title
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Democratizing AI: Open-source Scalable LLM Training on GPU-based Supercomputers
Siddharth Singh
Prajwal Singhania
Aditya K. Ranjan
John Kirchenbauer
Jonas Geiping
...
Abhimanyu Hans
Manli Shu
Aditya Tomar
Tom Goldstein
A. Bhatele
102
2
0
12 Feb 2025
Be like a Goldfish, Don't Memorize! Mitigating Memorization in
  Generative LLMs
Be like a Goldfish, Don't Memorize! Mitigating Memorization in Generative LLMs
Abhimanyu Hans
Yuxin Wen
Neel Jain
John Kirchenbauer
Hamid Kazemi
...
Siddharth Singh
Gowthami Somepalli
Jonas Geiping
A. Bhatele
Tom Goldstein
36
31
0
14 Jun 2024
Loki: Low-Rank Keys for Efficient Sparse Attention
Loki: Low-Rank Keys for Efficient Sparse Attention
Prajwal Singhania
Siddharth Singh
Shwai He
S. Feizi
A. Bhatele
34
13
0
04 Jun 2024
FAST: Factorizable Attention for Speeding up Transformers
FAST: Factorizable Attention for Speeding up Transformers
Armin Gerami
Monte Hoover
P. S. Dulepet
R. Duraiswami
29
0
0
12 Feb 2024
Pipit: Scripting the analysis of parallel execution traces
Pipit: Scripting the analysis of parallel execution traces
A. Bhatele
Rakrish Dhakal
Alex Movsesyan
A. Ranjan
Onur Cankur
17
1
0
19 Jun 2023
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize
  Mixture-of-Experts Training
A Hybrid Tensor-Expert-Data Parallelism Approach to Optimize Mixture-of-Experts Training
Siddharth Singh
Olatunji Ruwase
A. A. Awan
Samyam Rajbhandari
Yuxiong He
A. Bhatele
MoE
42
30
0
11 Mar 2023
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model
  Training
Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training
Siddharth Singh
A. Bhatele
32
9
0
10 Feb 2023
A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks
A Survey and Empirical Evaluation of Parallel Deep Learning Frameworks
Daniel Nichols
Siddharth Singh
Shuqing Lin
A. Bhatele
OOD
21
9
0
09 Nov 2021
ZeRO-Offload: Democratizing Billion-Scale Model Training
ZeRO-Offload: Democratizing Billion-Scale Model Training
Jie Ren
Samyam Rajbhandari
Reza Yazdani Aminabadi
Olatunji Ruwase
Shuangyang Yang
Minjia Zhang
Dong Li
Yuxiong He
MoE
177
414
0
18 Jan 2021
Megatron-LM: Training Multi-Billion Parameter Language Models Using
  Model Parallelism
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,821
0
17 Sep 2019
1