ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2006.10901
  4. Cited By
Sparse GPU Kernels for Deep Learning

Sparse GPU Kernels for Deep Learning

18 June 2020
Trevor Gale
Matei A. Zaharia
C. Young
Erich Elsen
ArXivPDFHTML

Papers citing "Sparse GPU Kernels for Deep Learning"

50 / 120 papers shown
Title
Efficient Mixed Precision Quantization in Graph Neural Networks
Efficient Mixed Precision Quantization in Graph Neural Networks
Samir Moustafa
Nils M. Kriege
Wilfried Gansterer
GNN
MQ
35
0
0
14 May 2025
Fused3S: Fast Sparse Attention on Tensor Cores
Fused3S: Fast Sparse Attention on Tensor Cores
Zitong Li
Aparna Chandramowlishwaran
GNN
45
0
0
12 May 2025
Samoyeds: Accelerating MoE Models with Structured Sparsity Leveraging Sparse Tensor Cores
Chenpeng Wu
Qiqi Gu
Heng Shi
Jianguo Yao
Haibing Guan
MoE
48
0
0
13 Mar 2025
Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs
Exploiting Unstructured Sparsity in Fully Homomorphic Encrypted DNNs
Aidan Ferguson
Perry Gibson
Lara DÁgata
Parker McLeod
Ferhat Yaman
Amitabh Das
Ian Colbert
José Cano
58
0
0
12 Mar 2025
An Efficient Row-Based Sparse Fine-Tuning
An Efficient Row-Based Sparse Fine-Tuning
Cen-Jhih Li
Aditya Bhaskara
56
0
0
17 Feb 2025
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix
  Multiplications on Tensor Cores
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores
Jinliang Shi
Shigang Li
Youxuan Xu
Rongtian Fu
Xueying Wang
Tong Wu
75
3
0
15 Dec 2024
HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs
  with Hybrid GPU Cores
HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
Zhonggen Li
Xiangyu Ke
Yifan Zhu
Yunjun Gao
Yaofeng Tu
69
0
0
12 Dec 2024
SuperGCN: General and Scalable Framework for GCN Training on CPU-powered
  Supercomputers
SuperGCN: General and Scalable Framework for GCN Training on CPU-powered Supercomputers
Chen Zhuang
Peng Chen
Xin Liu
Rio Yokota
Nikoli Dryden
Toshio Endo
Satoshi Matsuoka
M. Wahib
GNN
67
0
0
25 Nov 2024
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
Navigating Extremes: Dynamic Sparsity in Large Output Spaces
Nasib Ullah
Erik Schultheis
Mike Lasby
Yani Andrew Ioannou
Rohit Babbar
35
0
0
05 Nov 2024
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware
  Neuron Management
Ripple: Accelerating LLM Inference on Smartphones with Correlation-Aware Neuron Management
Tuowei Wang
Ruwen Fan
Minxing Huang
Zixu Hao
Kun Li
Ting Cao
Youyou Lu
Yaoxue Zhang
Ju Ren
45
2
0
25 Oct 2024
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved
  Layer-wise Pruning of Large Language Models
AlphaPruning: Using Heavy-Tailed Self Regularization Theory for Improved Layer-wise Pruning of Large Language Models
Haiquan Lu
Yefan Zhou
Shiwei Liu
Zhangyang Wang
Michael W. Mahoney
Yaoqing Yang
29
0
0
14 Oct 2024
Input-Dependent Power Usage in GPUs
Input-Dependent Power Usage in GPUs
Theo Gregersen
Pratyush Patel
Esha Choukse
30
2
0
26 Sep 2024
High Performance Unstructured SpMM Computation Using Tensor Cores
High Performance Unstructured SpMM Computation Using Tensor Cores
Patrik Okanovic
Grzegorz Kwa'sniewski
P. S. Labini
Maciej Besta
Flavio Vella
Torsten Hoefler
28
4
0
21 Aug 2024
Nerva: a Truly Sparse Implementation of Neural Networks
Nerva: a Truly Sparse Implementation of Neural Networks
Wieger Wesselink
Bram Grooten
Qiao Xiao
Cássio Machado de Campos
Mykola Pechenizkiy
30
0
0
24 Jul 2024
Scorch: A Library for Sparse Deep Learning
Scorch: A Library for Sparse Deep Learning
Bobby Yan
Alexander J. Root
Trevor Gale
David Broman
Fredrik Kjolstad
30
0
0
27 May 2024
Enhancing Fast Feed Forward Networks with Load Balancing and a Master
  Leaf Node
Enhancing Fast Feed Forward Networks with Load Balancing and a Master Leaf Node
Andreas Charalampopoulos
Nikolas Chatzis
Foivos Ntoulas-Panagiotopoulos
Charilaos Papaioannou
Alexandros Potamianos
33
0
0
27 May 2024
Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
Random Masking Finds Winning Tickets for Parameter Efficient Fine-tuning
Jing Xu
Jingzhao Zhang
39
7
0
04 May 2024
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
MULTIFLOW: Shifting Towards Task-Agnostic Vision-Language Pruning
Matteo Farina
Massimiliano Mancini
Elia Cunegatti
Gaowen Liu
Giovanni Iacca
Elisa Ricci
VLM
42
2
0
08 Apr 2024
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System
GNNBENCH: Fair and Productive Benchmarking for Single-GPU GNN System
Yidong Gong
Pradeep Kumar
GNN
35
3
0
05 Apr 2024
GeoT: Tensor Centric Library for Graph Neural Network via Efficient
  Segment Reduction on GPU
GeoT: Tensor Centric Library for Graph Neural Network via Efficient Segment Reduction on GPU
Zhongming Yu
Genghan Zhang
Hanxian Huang
Xin Chen
Jishen Zhao
GNN
29
0
0
03 Apr 2024
LSK3DNet: Towards Effective and Efficient 3D Perception with Large
  Sparse Kernels
LSK3DNet: Towards Effective and Efficient 3D Perception with Large Sparse Kernels
Tuo Feng
Wenguan Wang
Fan Ma
Yi Yang
3DV
34
6
0
22 Mar 2024
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of
  Neurons
Maxwell's Demon at Work: Efficient Pruning by Leveraging Saturation of Neurons
Simon Dufort-Labbé
P. DÓro
Evgenii Nikishin
Razvan Pascanu
Pierre-Luc Bacon
A. Baratin
34
1
0
12 Mar 2024
HiRE: High Recall Approximate Top-$k$ Estimation for Efficient LLM
  Inference
HiRE: High Recall Approximate Top-kkk Estimation for Efficient LLM Inference
Yashas Samaga
Varun Yerram
Chong You
Srinadh Bhojanapalli
Sanjiv Kumar
Prateek Jain
Praneeth Netrapalli
51
4
0
14 Feb 2024
A2Q+: Improving Accumulator-Aware Weight Quantization
A2Q+: Improving Accumulator-Aware Weight Quantization
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
Yaman Umuroglu
MQ
21
4
0
19 Jan 2024
GNNShap: Scalable and Accurate GNN Explanation using Shapley Values
GNNShap: Scalable and Accurate GNN Explanation using Shapley Values
Selahattin Akkas
Ariful Azad
FAtt
37
3
0
09 Jan 2024
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
RoSA: Accurate Parameter-Efficient Fine-Tuning via Robust Adaptation
Mahdi Nikdan
Soroush Tabesh
Elvir Crnčević
Dan Alistarh
8
27
0
09 Jan 2024
LLM in a flash: Efficient Large Language Model Inference with Limited
  Memory
LLM in a flash: Efficient Large Language Model Inference with Limited Memory
Keivan Alizadeh-Vahid
Iman Mirzadeh
Dmitry Belenko
Karen Khatamifard
Minsik Cho
C. C. D. Mundo
Mohammad Rastegari
Mehrdad Farajtabar
72
111
0
12 Dec 2023
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep
  Neural Networks
Dimension Mixer: A Generalized Method for Structured Sparsity in Deep Neural Networks
Suman Sapkota
Binod Bhattarai
34
0
0
30 Nov 2023
A Survey on Design Methodologies for Accelerating Deep Learning on
  Heterogeneous Architectures
A Survey on Design Methodologies for Accelerating Deep Learning on Heterogeneous Architectures
Fabrizio Ferrandi
S. Curzel
Leandro Fiorin
Daniele Ielmini
Cristina Silvano
...
Salvatore Filippone
F. L. Presti
Francesco Silvestri
P. Palazzari
Stefania Perri
19
4
0
29 Nov 2023
Harnessing Manycore Processors with Distributed Memory for Accelerated
  Training of Sparse and Recurrent Models
Harnessing Manycore Processors with Distributed Memory for Accelerated Training of Sparse and Recurrent Models
Jan Finkbeiner
Thomas Gmeinder
M. Pupilli
A. Titterton
Emre Neftci
19
3
0
07 Nov 2023
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel
  Max Series GPU
Performance Optimization of Deep Learning Sparse Matrix Kernels on Intel Max Series GPU
Mohammad Zubair
Christoph Bauinger
14
0
0
01 Nov 2023
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Dynamic Sparse No Training: Training-Free Fine-tuning for Sparse LLMs
Yu-xin Zhang
Lirui Zhao
Mingbao Lin
Yunyun Sun
Yiwu Yao
Xingjia Han
Jared Tanner
Shiwei Liu
Rongrong Ji
SyDa
37
40
0
13 Oct 2023
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Sparse Fine-tuning for Inference Acceleration of Large Language Models
Eldar Kurtic
Denis Kuznedelev
Elias Frantar
Michael Goin
Dan Alistarh
27
8
0
10 Oct 2023
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for
  Pruning LLMs to High Sparsity
Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity
Lu Yin
You Wu
Zhenyu (Allen) Zhang
Cheng-Yu Hsieh
Yaqing Wang
...
Mykola Pechenizkiy
Yi Liang
Michael Bendersky
Zhangyang Wang
Shiwei Liu
28
78
0
08 Oct 2023
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor
  Cores
VENOM: A Vectorized N:M Format for Unleashing the Power of Sparse Tensor Cores
Roberto L. Castro
Andrei Ivanov
Diego Andrade
Tal Ben-Nun
B. Fraguela
Torsten Hoefler
19
15
0
03 Oct 2023
The Sparsity Roofline: Understanding the Hardware Limits of Sparse
  Neural Networks
The Sparsity Roofline: Understanding the Hardware Limits of Sparse Neural Networks
Cameron Shinn
Collin McCarthy
Saurav Muralidharan
Muhammad Osama
John Douglas Owens
16
2
0
30 Sep 2023
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative
  Model Inference with Unstructured Sparsity
Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured Sparsity
Haojun Xia
Zhen Zheng
Yuchao Li
Donglin Zhuang
Zhongzhu Zhou
Xiafei Qiu
Yong Li
Wei Lin
S. Song
59
11
0
19 Sep 2023
A Generalization of Continuous Relaxation in Structured Pruning
A Generalization of Continuous Relaxation in Structured Pruning
Brad Larson
Bishal Upadhyaya
Luke McDermott
Siddha Ganju
16
0
0
28 Aug 2023
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
A2Q: Accumulator-Aware Quantization with Guaranteed Overflow Avoidance
Ian Colbert
Alessandro Pappalardo
Jakoba Petri-Koenig
MQ
16
9
0
25 Aug 2023
Cached Operator Reordering: A Unified View for Fast GNN Training
Cached Operator Reordering: A Unified View for Fast GNN Training
Julia Bazinska
Andrei Ivanov
Tal Ben-Nun
Nikoli Dryden
Maciej Besta
Siyuan Shen
Torsten Hoefler
GNN
22
3
0
23 Aug 2023
Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights
  Generation
Mitigating Memory Wall Effects in CNN Engines with On-the-Fly Weights Generation
Stylianos I. Venieris
Javier Fernandez-Marques
Nicholas D. Lane
MQ
16
3
0
25 Jul 2023
Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication
  Kernels
Rosko: Row Skipping Outer Products for Sparse Matrix Multiplication Kernels
Vikas Natesh
Andrew Sabot
H. T. Kung
Mark Ting
22
0
0
08 Jul 2023
SparseOptimizer: Sparsify Language Models through Moreau-Yosida
  Regularization and Accelerate via Compiler Co-design
SparseOptimizer: Sparsify Language Models through Moreau-Yosida Regularization and Accelerate via Compiler Co-design
Fu-Ming Guo
MoE
13
0
0
27 Jun 2023
Sparse Modular Activation for Efficient Sequence Modeling
Sparse Modular Activation for Efficient Sequence Modeling
Liliang Ren
Yang Liu
Shuohang Wang
Yichong Xu
Chenguang Zhu
Chengxiang Zhai
43
13
0
19 Jun 2023
Breaking On-device Training Memory Wall: A Systematic Survey
Breaking On-device Training Memory Wall: A Systematic Survey
Shitian Li
Chunlin Tian
Kahou Tam
Ruirui Ma
Li Li
21
2
0
17 Jun 2023
Dynamic Sparsity Is Channel-Level Sparsity Learner
Dynamic Sparsity Is Channel-Level Sparsity Learner
Lu Yin
Gen Li
Meng Fang
Lijuan Shen
Tianjin Huang
Zhangyang Wang
Vlado Menkovski
Xiaolong Ma
Mykola Pechenizkiy
Shiwei Liu
25
20
0
30 May 2023
Reparo: Loss-Resilient Generative Codec for Video Conferencing
Reparo: Loss-Resilient Generative Codec for Video Conferencing
Tianhong Li
Vibhaalakshmi Sivaraman
Pantea Karimi
Lijie Fan
M. Alizadeh
Dina Katabi
19
7
0
23 May 2023
Dynamic Sparse Training with Structured Sparsity
Dynamic Sparse Training with Structured Sparsity
Mike Lasby
A. Golubeva
Utku Evci
Mihai Nica
Yani Andrew Ioannou
29
19
0
03 May 2023
JaxPruner: A concise library for sparsity research
JaxPruner: A concise library for sparsity research
Jooyoung Lee
Wonpyo Park
Nicole Mitchell
Jonathan Pilault
J. Obando-Ceron
...
Hong-Seok Kim
Yann N. Dauphin
Karolina Dziugaite
P. S. Castro
Utku Evci
36
14
0
27 Apr 2023
STen: Productive and Efficient Sparsity in PyTorch
STen: Productive and Efficient Sparsity in PyTorch
Andrei Ivanov
Nikoli Dryden
Tal Ben-Nun
Saleh Ashkboos
Torsten Hoefler
32
4
0
15 Apr 2023
123
Next