ResearchTrend.AI
  • Communities
  • Connect sessions
  • AI calendar
  • Organizations
  • Join Slack
  • Contact Sales
Papers
Communities
Social Events
Terms and Conditions
Pricing
Contact Sales
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2026 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1510.00844
  4. Cited By
Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix
  Multiplication
v1v2v3 (latest)

Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication

3 October 2015
A. Azad
Grey Ballard
A. Buluç
J. Demmel
L. Grigori
O. Schwartz
Sivan Toledo
Samuel Williams
ArXiv (abs)PDFHTML

Papers citing "Exploiting Multiple Levels of Parallelism in Sparse Matrix-Matrix Multiplication"

38 / 38 papers shown
Slicing Is All You Need: Towards A Universal One-Sided Algorithm for Distributed Matrix Multiplication
Slicing Is All You Need: Towards A Universal One-Sided Algorithm for Distributed Matrix Multiplication
Benjamin Brock
Renato Golin
91
1
0
10 Oct 2025
Sparsity-Aware Communication for Distributed Graph Neural Network Training
Sparsity-Aware Communication for Distributed Graph Neural Network TrainingInternational Conference on Parallel Processing (ICPP), 2024
Ujjaini Mukhodopadhyay
Alok Tripathy
Oguz Selvitopi
Katherine Yelick
A. Buluç
382
6
0
07 Apr 2025
A sparsity-aware distributed-memory algorithm for sparse-sparse matrix
  multiplication
A sparsity-aware distributed-memory algorithm for sparse-sparse matrix multiplicationInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024
Yuxi Hong
A. Buluç
169
6
0
26 Aug 2024
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse
  Tall-and-Skinny Matrix Multiplication
Distributed-Memory Parallel Algorithms for Sparse Matrix and Sparse Tall-and-Skinny Matrix MultiplicationInternational Conference for High Performance Computing, Networking, Storage and Analysis (SC), 2024
Isuru Ranawaka
Md Taufique Hussain
Charles Block
Gerasimos Gerogiannis
Josep Torrellas
Ariful Azad
188
3
0
21 Aug 2024
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse
  Kernels
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
Nabil Abubaker
Torsten Hoefler
271
1
0
30 Apr 2024
NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled
  Spatial Accelerator
NeuraChip: Accelerating GNN Computations with a Hash-based Decoupled Spatial Accelerator
Kaustubh Shivdikar
Nicolas Bohm Agostini
Malith Jayaweera
Gilbert Jonatan
José L. Abellán
Ajay Joshi
John Kim
David Kaeli
GNN
343
6
0
23 Apr 2024
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUs
RDMA-Based Algorithms for Sparse Matrix Multiplication on GPUsInternational Conference on Supercomputing (ICS), 2023
Benjamin Brock
A. Buluç
Katherine Yelick
221
6
0
29 Nov 2023
Optimization of SpGEMM with Risc-V vector instructions
Optimization of SpGEMM with Risc-V vector instructions
Valentin Le Fèvre
Marc Casas
143
0
0
04 Mar 2023
A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral
  Clustering
A Distributed Block Chebyshev-Davidson Algorithm for Parallel Spectral ClusteringJournal of Scientific Computing (J. Sci. Comput.), 2022
Qiyuan Pang
Haizhao Yang
191
5
0
08 Dec 2022
Parallel, Portable Algorithms for Distance-2 Maximal Independent Set and
  Graph Coarsening
Parallel, Portable Algorithms for Distance-2 Maximal Independent Set and Graph CoarseningIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2022
Brian Kelley
S. Rajamanickam
141
3
0
06 Apr 2022
pylspack: Parallel algorithms and data structures for sketching, column
  subset selection, regression and leverage scores
pylspack: Parallel algorithms and data structures for sketching, column subset selection, regression and leverage scoresACM Transactions on Mathematical Software (TOMS), 2022
Aleksandros Sobczyk
Efstratios Gallopoulos
222
9
0
05 Mar 2022
Fast Dynamic Updates and Dynamic SpGEMM on MPI-Distributed Graphs
Fast Dynamic Updates and Dynamic SpGEMM on MPI-Distributed GraphsIEEE International Conference on Cluster Computing (Cluster), 2022
Alexander van der Grinten
G. Custers
Duy Le Thanh
Henning Meyerhenke
133
2
0
17 Feb 2022
Parallel Algorithms for Adding a Collection of Sparse Matrices
Parallel Algorithms for Adding a Collection of Sparse MatricesIEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPS), 2021
Md Taufique Hussain
Guttu Sai Abhishek
A. Buluç
A. Azad
173
5
0
19 Dec 2021
Parallel Algorithms for Masked Sparse Matrix-Matrix Products
Parallel Algorithms for Masked Sparse Matrix-Matrix Products
Srđan Milaković
Oguz Selvitopi
Israt Nisa
Zoran Budimlic
A. Buluç
129
10
0
18 Nov 2021
Combinatorial BLAS 2.0: Scaling combinatorial algorithms on
  distributed-memory systems
Combinatorial BLAS 2.0: Scaling combinatorial algorithms on distributed-memory systemsIEEE Transactions on Parallel and Distributed Systems (TPDS), 2021
A. Azad
Oguz Selvitopi
Md Taufique Hussain
J. Gilbert
A. Buluç
118
32
0
28 Jun 2021
The Chunks and Tasks Matrix Library 2.0
The Chunks and Tasks Matrix Library 2.0
Emanuel H. Rubensson
Elias Rudberg
Anastasia Kruchinina
Anton G. Artemov
104
1
0
23 Nov 2020
Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix
  Multiplication at Extreme Scale
Communication-Avoiding and Memory-Constrained Sparse Matrix-Matrix Multiplication at Extreme ScaleIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Md Taufique Hussain
Oguz Selvitopi
A. Buluç
A. Azad
MoE
157
16
0
16 Oct 2020
Reducing Communication in Graph Neural Network Training
Reducing Communication in Graph Neural Network Training
Alok Tripathy
Katherine Yelick
A. Buluç
GNN
280
116
0
07 May 2020
Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix
  Multiplication using Propagation Blocking
Bandwidth-Optimized Parallel Algorithms for Sparse Matrix-Matrix Multiplication using Propagation BlockingACM Symposium on Parallelism in Algorithms and Architectures (SPAA), 2020
Zhixiang Gu
José Moreira
D. Edelsohn
A. Azad
124
30
0
26 Feb 2020
A Systematic Survey of General Sparse Matrix-Matrix Multiplication
A Systematic Survey of General Sparse Matrix-Matrix MultiplicationACM Computing Surveys (ACM CSUR), 2020
Jianhua Gao
Weixing Ji
Fangli Chang
Zhaonian Tan
Bingxin Wei
Zeming Liu
Yueyan Zhao
226
79
0
26 Feb 2020
Optimizing High Performance Markov Clustering for Pre-Exascale
  Architectures
Optimizing High Performance Markov Clustering for Pre-Exascale ArchitecturesIEEE International Parallel and Distributed Processing Symposium (IPDPS), 2020
Oguz Selvitopi
Md Taufique Hussain
A. Azad
A. Buluç
114
25
0
24 Feb 2020
SpArch: Efficient Architecture for Sparse Matrix Multiplication
SpArch: Efficient Architecture for Sparse Matrix MultiplicationInternational Symposium on High-Performance Computer Architecture (HPCA), 2020
Zhekai Zhang
Hanrui Wang
Song Han
W. Dally
187
271
0
20 Feb 2020
The Parallelism Motifs of Genomic Data Analysis
The Parallelism Motifs of Genomic Data Analysis
Katherine Yelick
A. Buluç
M. Awan
A. Azad
Benjamin Brock
...
Giulia Guidi
S. Hofmeyr
Oguz Selvitopi
Cristina Teodoropol
L. Oliker
179
17
0
20 Jan 2020
Communication-Efficient Jaccard Similarity for High-Performance
  Distributed Genome Comparisons
Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons
Maciej Besta
Raghavendra Kanakagiri
Harun Mustafa
Mikhail Karasikov
Gunnar Rätsch
Torsten Hoefler
Edgar Solomonik
393
78
0
11 Nov 2019
Efficient computation of the density matrix with error control on
  distributed computer systems
Efficient computation of the density matrix with error control on distributed computer systems
Anastasia Kruchinina
Elias Rudberg
Emanuel H. Rubensson
104
5
0
27 Sep 2019
Prior-preconditioned conjugate gradient method for accelerated Gibbs
  sampling in "large $n$ & large $p$" Bayesian sparse regression
Prior-preconditioned conjugate gradient method for accelerated Gibbs sampling in "large nnn & large ppp" Bayesian sparse regression
A. Nishimura
M. Suchard
372
23
0
29 Oct 2018
Implementing Push-Pull Efficiently in GraphBLAS
Implementing Push-Pull Efficiently in GraphBLAS
Carl Yang
A. Buluç
John Douglas Owens
217
43
0
10 Apr 2018
High-performance sparse matrix-matrix products on Intel KNL and
  multicore architectures
High-performance sparse matrix-matrix products on Intel KNL and multicore architectures
Yusuke Nagasaka
Satoshi Matsuoka
A. Azad
A. Buluç
129
49
0
05 Apr 2018
Sparse Matrix Multiplication and Triangle Listing in the Congested
  Clique Model
Sparse Matrix Multiplication and Triangle Listing in the Congested Clique Model
K. Censor-Hillel
Dean Leitersdorf
Elia Turner
176
35
0
13 Feb 2018
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU
  Architectures
Multi-threaded Sparse Matrix-Matrix Multiplication for Many-Core and GPU Architectures
Mehmet Deveci
C. Trott
S. Rajamanickam
144
56
0
09 Jan 2018
Communication-Avoiding Optimization Methods for Distributed
  Massive-Scale Sparse Inverse Covariance Estimation
Communication-Avoiding Optimization Methods for Distributed Massive-Scale Sparse Inverse Covariance EstimationInternational Conference on Artificial Intelligence and Statistics (AISTATS), 2017
Penporn Koanantakool
Alnur Ali
A. Azad
A. Buluç
Dmitriy Morozov
L. Oliker
Katherine Yelick
Sang-Yun Oh
123
12
0
30 Oct 2017
Distributed Triangle Counting in the Graphulo Matrix Math Library
Distributed Triangle Counting in the Graphulo Matrix Math Library
D. Hutchison
120
8
0
20 Aug 2017
Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a
  2.5D Algorithm and One-Sided MPI
Increasing the Efficiency of Sparse Matrix-Matrix Multiplication with a 2.5D Algorithm and One-Sided MPI
A. Lazzaro
J. VandeVondele
J. Hutter
O. Schütt
124
27
0
29 May 2017
Scaling betweenness centrality using communication-efficient sparse
  matrix multiplication
Scaling betweenness centrality using communication-efficient sparse matrix multiplication
Edgar Solomonik
Maciej Besta
Flavio Vella
Torsten Hoefler
227
80
0
22 Sep 2016
Novel Graph Processor Architecture, Prototype System, and Results
Novel Graph Processor Architecture, Prototype System, and Results
William S. Song
Vitaliy Gleyzer
A. Lomakin
J. Kepner
GNN
92
24
0
22 Jul 2016
Mathematical Foundations of the GraphBLAS
Mathematical Foundations of the GraphBLAS
J. Kepner
Peter Aaltonen
David A. Bader
A. Buluç
F. Franchetti
...
José Moreira
John Douglas Owens
Carl Yang
Marcin Zalewski
Tim Mattson
226
241
0
18 Jun 2016
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication
Hypergraph Partitioning for Sparse Matrix-Matrix Multiplication
Grey Ballard
Alex Druinsky
Nicholas Knight
O. Schwartz
117
58
0
17 Mar 2016
Locality-aware parallel block-sparse matrix-matrix multiplication using
  the Chunks and Tasks programming model
Locality-aware parallel block-sparse matrix-matrix multiplication using the Chunks and Tasks programming model
Emanuel H. Rubensson
Elias Rudberg
193
23
0
30 Jan 2015
1