ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2503.16672
34
0

Accelerating Transformer Inference and Training with 2:4 Activation Sparsity

20 March 2025
Daniel Haziza
Timothy Chou
Dhruv Choudhary
Luca Wehrstedt
Francisco Massa
Jiecao Yu
Geonhwa Jeong
Supriya Rao
Patrick Labatut
Jesse Cai
ArXivPDFHTML
Abstract

In this paper, we demonstrate how to leverage 2:4 sparsity, a popular hardware-accelerated GPU sparsity pattern, to activations to accelerate large language model training and inference. Crucially we exploit the intrinsic sparsity found in Squared-ReLU activations to provide this acceleration with no accuracy loss. Our approach achieves up to 1.3x faster Feed Forward Network (FFNs) in both the forwards and backwards pass. This work highlights the potential for sparsity to play a key role in accelerating large language model training and inference.

View on arXiv
@article{haziza2025_2503.16672,
  title={ Accelerating Transformer Inference and Training with 2:4 Activation Sparsity },
  author={ Daniel Haziza and Timothy Chou and Dhruv Choudhary and Luca Wehrstedt and Francisco Massa and Jiecao Yu and Geonhwa Jeong and Supriya Rao and Patrick Labatut and Jesse Cai },
  journal={arXiv preprint arXiv:2503.16672},
  year={ 2025 }
}
Comments on this paper