Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1908.09606
Cited By
v1
v2
v3 (latest)
Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication
26 August 2019
Grzegorz Kwa'sniewski
Marko Kabić
Maciej Besta
J. VandeVondele
R. Solcà
Torsten Hoefler
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication"
41 / 41 papers shown
Title
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
Anjiang Wei
Allen Nie
Diyi Yang
Rohan Yadav
Wonchan Lee
Ke Wang
Alex Aiken
61
3
0
21 Oct 2024
Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations
Hussam Al Daas
Grey Ballard
L. Grigori
Suraj Kumar
Kathryn Rouse
Mathieu Verite
13
1
0
17 Sep 2024
Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offs
Toni Böhnlein
Pál András Papp
A. N. Yzelman
24
1
0
05 Sep 2024
High Performance Unstructured SpMM Computation Using Tensor Cores
Patrik Okanovic
Grzegorz Kwa'sniewski
P. S. Labini
Maciej Besta
Flavio Vella
Torsten Hoefler
116
6
0
21 Aug 2024
Demystifying Higher-Order Graph Neural Networks
Maciej Besta
Florian Scheidl
Lukas Gianinazzi
S. Klaiman
Jürgen Müller
Torsten Hoefler
101
3
0
18 Jun 2024
Supercomputers as a Continous Medium
Martin Karp
Niclas Jansson
P. Schlatter
Stefano Markidis
41
0
0
09 May 2024
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
Nabil Abubaker
Torsten Hoefler
87
0
0
30 Apr 2024
Tightening I/O Lower Bounds through the Hourglass Dependency Pattern
Lionel Eyraud-Dubois
Guillaume Iooss
Julien Langou
Fabrice Rastello
20
0
0
25 Apr 2024
Fast Kronecker Matrix-Matrix Multiplication on GPUs
Abhinav Jangda
Mohit Yadav
65
2
0
18 Jan 2024
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
152
1
0
01 Dec 2023
Optimizing Distributed Tensor Contractions using Node-Aware Processor Grids
Andreas Irmler
Raghavendra Kanakagiri
S. Ohlmann
Edgar Solomonik
A. Grüneis
51
2
0
17 Jul 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
57
2
0
17 Jan 2023
Sparse Hamming Graph: A Customizable Network-on-Chip Topology
Patrick Iff
Maciej Besta
Matheus A. Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
84
7
0
25 Nov 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DH
GNN
AI4CE
79
23
0
03 Sep 2022
Deinsum: Practically I/O Optimal Multilinear Algebra
A. Ziogas
Grzegorz Kwa'sniewski
Tal Ben-Nun
Timo Schneider
Torsten Hoefler
96
5
0
16 Jun 2022
Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds
Hussam Al Daas
Grey Ballard
L. Grigori
Suraj Kumar
Kathryn Rouse
38
5
0
26 May 2022
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis
Maciej Besta
Torsten Hoefler
GNN
111
57
0
19 May 2022
Communication Bounds for Convolutional Neural Networks
An Chen
J. Demmel
Grace Dinh
Mason Haberle
Olga Holtz
49
4
0
18 Apr 2022
DISTAL: The Distributed Tensor Algebra Compiler
Rohan Yadav
A. Aiken
Fredrik Kjolstad
55
30
0
15 Mar 2022
Distributed-Memory Sparse Kernels for Machine Learning
V. Bharadwaj
A. Buluç
J. Demmel
FedML
53
11
0
15 Mar 2022
pylspack: Parallel algorithms and data structures for sketching, column subset selection, regression and leverage scores
Aleksandros Sobczyk
Efstratios Gallopoulos
52
7
0
05 Mar 2022
TAMM: Tensor Algebra for Many-body Methods
Erdal Mutlu
Ajay Panyala
Nitin Gawande
Abhishek Bagusetty
Jinsung Kim
Karol Kowalski
Nicholas P. Bauman
B. Peng
J. Brabec
S. Krishnamoorthy
52
13
0
04 Jan 2022
Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose
Viviana Arrigoni
Filippo Maggioli
A. Massini
Emanuele Rodolà
29
5
0
25 Oct 2021
Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics
Dewi Yokelson
N. Tkachenko
R. Robey
Ying Wai Li
Pavel A. Dub
51
9
0
09 Sep 2021
A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays
Martin Karp
Artur Podobas
Tobias Kenter
Niclas Jansson
Christian Plessl
P. Schlatter
Stefano Markidis
75
8
0
27 Aug 2021
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
Grzegorz Kwa'sniewski
Marko Kabić
Tal Ben-Nun
A. Ziogas
Jens Eirik Saethre
...
Timo Schneider
Maciej Besta
Anton Kozhevnikov
J. VandeVondele
Torsten Hoefler
70
15
0
20 Aug 2021
Accelerating XOR-based Erasure Coding using Program Optimization Techniques
Yuya Uezato
32
11
0
05 Aug 2021
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNN
AI4CE
LRM
130
137
0
14 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
165
101
0
01 Jul 2021
COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling
Marko Kabić
S. Pintarelli
Anton Kozhevnikov
J. VandeVondele
57
3
0
11 Jun 2021
Towards Million-Server Network Simulations on Just a Laptop
Maciej Besta
Marcel Schneider
Salvatore Di Girolamo
Ankit Singla
Torsten Hoefler
49
3
0
26 May 2021
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
Maciej Besta
Zur Vonarburg-Shmaria
Yannick Schaffner
Leonardo Schwarz
Grzegorz Kwa'sniewski
...
Philipp Lindenberger
Pavel Kalvoda
Marek Konieczny
O. Mutlu
Torsten Hoefler
82
26
0
05 Mar 2021
I/O Lower Bounds for Auto-tuning of Convolutions in CNNs
Xiaoyang Zhang
Junmin Xiao
Guangming Tan
55
9
0
31 Dec 2020
Substream-Centric Maximum Matchings on FPGA
Maciej Besta
Marc Fischer
Tal Ben-Nun
Dimitri Stanojevic
Johannes de Fine Licht
Torsten Hoefler
91
29
0
28 Oct 2020
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization
Grzegorz Kwa'sniewski
Tal Ben-Nun
A. Ziogas
Timo Schneider
Maciej Besta
Torsten Hoefler
68
7
0
12 Oct 2020
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
Maciej Besta
Armon Carigiet
Zur Vonarburg-Shmaria
Kacper Janda
Lukas Gianinazzi
Torsten Hoefler
97
25
0
26 Aug 2020
A parallel structured divide-and-conquer algorithm for symmetric tridiagonal eigenvalue problems
Xiangke Liao
Shengguo Li
Yutong Lu
J. Román
34
10
0
05 Aug 2020
High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks
Maciej Besta
Jens Domke
Marcel Schneider
Marek Konieczny
Salvatore Di Girolamo
Timo Schneider
Ankit Singla
Torsten Hoefler
70
5
0
07 Jul 2020
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems
Maciej Besta
Marc Fischer
Vasiliki Kalavri
Michael Kapralov
Torsten Hoefler
GNN
113
56
0
29 Dec 2019
Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons
Maciej Besta
Raghavendra Kanakagiri
Harun Mustafa
Mikhail Karasikov
Gunnar Rätsch
Torsten Hoefler
Edgar Solomonik
67
67
0
11 Nov 2019
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
Maciej Besta
Robert Gerstenberger
E. Peter
Marc Fischer
Michal Podstawski
Claude Barthels
Gustavo Alonso
Torsten Hoefler
GNN
100
97
0
20 Oct 2019
1