ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.14600
  4. Cited By
Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores

29 September 2020
Orestis Zachariadis
Nitin Satpute
Juan Gómez Luna
J. Olivares
ArXivPDFHTML

Papers citing "Accelerating Sparse Matrix-Matrix Multiplication with GPU Tensor Cores"

17 / 17 papers shown
Title
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix
  Multiplications on Tensor Cores
FlashSparse: Minimizing Computation Redundancy for Fast Sparse Matrix Multiplications on Tensor Cores
Jinliang Shi
Shigang Li
Youxuan Xu
Rongtian Fu
Xueying Wang
Tong Wu
70
3
0
15 Dec 2024
HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs
  with Hybrid GPU Cores
HC-SpMM: Accelerating Sparse Matrix-Matrix Multiplication for Graphs with Hybrid GPU Cores
Zhonggen Li
Xiangyu Ke
Yifan Zhu
Yunjun Gao
Yaofeng Tu
69
0
0
12 Dec 2024
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition
  Using GPU Tensor Cores
cuFastTuckerPlus: A Stochastic Parallel Sparse FastTucker Decomposition Using GPU Tensor Cores
Zixuan Li
Mingxing Duan
Huizhang Luo
Wangdong Yang
KenLi Li
Keqin Li
29
0
0
15 Apr 2024
Multi-GPU aggregation-based AMG preconditioner for iterative linear
  solvers
Multi-GPU aggregation-based AMG preconditioner for iterative linear solvers
M. Bernaschi
Alessandro Celestini
P. DÁmbra
Flavio Vella
14
9
0
04 Mar 2023
ZeroFL: Efficient On-Device Training for Federated Learning with Local
  Sparsity
ZeroFL: Efficient On-Device Training for Federated Learning with Local Sparsity
Xinchi Qiu
Javier Fernandez-Marques
Pedro Gusmão
Yan Gao
Titouan Parcollet
Nicholas D. Lane
FedML
37
66
0
04 Aug 2022
Recovering single precision accuracy from Tensor Cores while surpassing
  the FP32 theoretical peak performance
Recovering single precision accuracy from Tensor Cores while surpassing the FP32 theoretical peak performance
Hiroyuki Ootomo
Rio Yokota
11
32
0
07 Mar 2022
Blocking Techniques for Sparse Matrix Multiplication on Tensor
  Accelerators
Blocking Techniques for Sparse Matrix Multiplication on Tensor Accelerators
P. S. Labini
M. Bernaschi
Francesco Silvestri
Flavio Vella
9
3
0
11 Feb 2022
Learning from distinctive candidates to optimize reduced-precision
  convolution program on tensor cores
Learning from distinctive candidates to optimize reduced-precision convolution program on tensor cores
Junkyeong Choi
Hyucksung Kwon
W. Lee
Jungwook Choi
Jieun Lim
14
0
0
11 Feb 2022
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph
  Processing on GPU
Bit-GraphBLAS: Bit-Level Optimizations of Matrix-Centric Graph Processing on GPU
Jou-An Chen
Hsin-Hsuan Sung
Xipeng Shen
Nathan R. Tallent
Kevin J. Barker
Ang Li
GNN
16
5
0
21 Jan 2022
Squeeze: Efficient Compact Fractals for Tensor Core GPUs
Squeeze: Efficient Compact Fractals for Tensor Core GPUs
Felipe A. Quezada
C. Navarro
N. Hitschfeld-Kahler
B. Bustos
26
8
0
03 Jan 2022
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs
TC-GNN: Bridging Sparse GNN Computation and Dense Tensor Cores on GPUs
Yuke Wang
Boyuan Feng
Zheng Wang
Guyue Huang
Yufei Ding
GNN
16
26
0
03 Dec 2021
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the
  Edge
BitTrain: Sparse Bitmap Compression for Memory-Efficient Training on the Edge
Abdelrahman I. Hosny
Marina Neseem
Sherief Reda
MQ
25
4
0
29 Oct 2021
Accelerating Framework of Transformer by Hardware Design and Model
  Compression Co-Optimization
Accelerating Framework of Transformer by Hardware Design and Model Compression Co-Optimization
Panjie Qi
E. Sha
Qingfeng Zhuge
Hongwu Peng
Shaoyi Huang
Zhenglun Kong
Yuhong Song
Bingbing Li
11
49
0
19 Oct 2021
GPU Semiring Primitives for Sparse Neighborhood Methods
GPU Semiring Primitives for Sparse Neighborhood Methods
Corey J. Nolet
Divye Gala
Edward Raff
Joe Eaton
Brad Rees
John Zedlewski
Tim Oates
11
4
0
13 Apr 2021
Toward Performance-Portable PETSc for GPU-based Exascale Systems
Toward Performance-Portable PETSc for GPU-based Exascale Systems
R. Mills
M. Adams
S. Balay
Jed Brown
A. Dener
...
K. Rupp
Barry F. Smith
Stefano Zampini
Hong Zhang
Junchao Zhang
11
49
0
02 Nov 2020
Matrix Engines for High Performance Computing:A Paragon of Performance
  or Grasping at Straws?
Matrix Engines for High Performance Computing:A Paragon of Performance or Grasping at Straws?
Jens Domke
Emil Vatai
Aleksandr Drozd
Peng Chen
Yosuke Oyama
...
Shweta Salaria
Daichi Mukunoki
Artur Podobas
M. Wahib
Satoshi Matsuoka
24
23
0
27 Oct 2020
A Systematic Survey of General Sparse Matrix-Matrix Multiplication
A Systematic Survey of General Sparse Matrix-Matrix Multiplication
Jianhua Gao
Weixing Ji
Fangli Chang
Zhaonian Tan
Bingxin Wei
Zeming Liu
Yueyan Zhao
18
57
0
26 Feb 2020
1