ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1908.09606
  4. Cited By
Red-blue pebbling revisited: near optimal parallel matrix-matrix
  multiplication
v1v2v3 (latest)

Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication

26 August 2019
Grzegorz Kwa'sniewski
Marko Kabić
Maciej Besta
J. VandeVondele
R. Solcà
Torsten Hoefler
    LRM
ArXiv (abs)PDFHTML

Papers citing "Red-blue pebbling revisited: near optimal parallel matrix-matrix multiplication"

41 / 41 papers shown
Title
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
Improving Parallel Program Performance with LLM Optimizers via Agent-System Interfaces
Anjiang Wei
Allen Nie
Diyi Yang
Rohan Yadav
Wonchan Lee
Ke Wang
Alex Aiken
61
3
0
21 Oct 2024
Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix
  Computations
Communication Lower Bounds and Optimal Algorithms for Symmetric Matrix Computations
Hussam Al Daas
Grey Ballard
L. Grigori
Suraj Kumar
Kathryn Rouse
Mathieu Verite
13
1
0
17 Sep 2024
Red-Blue Pebbling with Multiple Processors: Time, Communication and
  Memory Trade-offs
Red-Blue Pebbling with Multiple Processors: Time, Communication and Memory Trade-offs
Toni Böhnlein
Pál András Papp
A. N. Yzelman
24
1
0
05 Sep 2024
High Performance Unstructured SpMM Computation Using Tensor Cores
High Performance Unstructured SpMM Computation Using Tensor Cores
Patrik Okanovic
Grzegorz Kwa'sniewski
P. S. Labini
Maciej Besta
Flavio Vella
Torsten Hoefler
116
6
0
21 Aug 2024
Demystifying Higher-Order Graph Neural Networks
Demystifying Higher-Order Graph Neural Networks
Maciej Besta
Florian Scheidl
Lukas Gianinazzi
S. Klaiman
Jürgen Müller
Torsten Hoefler
101
3
0
18 Jun 2024
Supercomputers as a Continous Medium
Supercomputers as a Continous Medium
Martin Karp
Niclas Jansson
P. Schlatter
Stefano Markidis
41
0
0
09 May 2024
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse
  Kernels
SpComm3D: A Framework for Enabling Sparse Communication in 3D Sparse Kernels
Nabil Abubaker
Torsten Hoefler
87
0
0
30 Apr 2024
Tightening I/O Lower Bounds through the Hourglass Dependency Pattern
Tightening I/O Lower Bounds through the Hourglass Dependency Pattern
Lionel Eyraud-Dubois
Guillaume Iooss
Julien Langou
Fabrice Rastello
20
0
0
25 Apr 2024
Fast Kronecker Matrix-Matrix Multiplication on GPUs
Fast Kronecker Matrix-Matrix Multiplication on GPUs
Abhinav Jangda
Mohit Yadav
65
2
0
18 Jan 2024
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
PipeOptim: Ensuring Effective 1F1B Schedule with Optimizer-Dependent Weight Prediction
Lei Guan
Dongsheng Li
Jiye Liang
Wenjian Wang
Wenjian Wang
Xicheng Lu
152
1
0
01 Dec 2023
Optimizing Distributed Tensor Contractions using Node-Aware Processor
  Grids
Optimizing Distributed Tensor Contractions using Node-Aware Processor Grids
Andreas Irmler
Raghavendra Kanakagiri
S. Ohlmann
Edgar Solomonik
A. Grüneis
51
2
0
17 Jul 2023
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth
  Cost
AutoDDL: Automatic Distributed Deep Learning with Near-Optimal Bandwidth Cost
Jinfan Chen
Shigang Li
Ran Guo
Jinhui Yuan
Torsten Hoefler
57
2
0
17 Jan 2023
Sparse Hamming Graph: A Customizable Network-on-Chip Topology
Sparse Hamming Graph: A Customizable Network-on-Chip Topology
Patrick Iff
Maciej Besta
Matheus A. Cavalcante
Tim Fischer
Luca Benini
Torsten Hoefler
84
7
0
25 Nov 2022
HammingMesh: A Network Topology for Large-Scale Deep Learning
HammingMesh: A Network Topology for Large-Scale Deep Learning
Torsten Hoefler
Tommaso Bonato
Daniele De Sensi
Salvatore Di Girolamo
Shigang Li
Marco Heddes
Jon Belk
Deepak Goel
Miguel Castro
Steve Scott
3DHGNNAI4CE
79
23
0
03 Sep 2022
Deinsum: Practically I/O Optimal Multilinear Algebra
Deinsum: Practically I/O Optimal Multilinear Algebra
A. Ziogas
Grzegorz Kwa'sniewski
Tal Ben-Nun
Timo Schneider
Torsten Hoefler
96
5
0
16 Jun 2022
Tight Memory-Independent Parallel Matrix Multiplication Communication
  Lower Bounds
Tight Memory-Independent Parallel Matrix Multiplication Communication Lower Bounds
Hussam Al Daas
Grey Ballard
L. Grigori
Suraj Kumar
Kathryn Rouse
38
5
0
26 May 2022
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency
  Analysis
Parallel and Distributed Graph Neural Networks: An In-Depth Concurrency Analysis
Maciej Besta
Torsten Hoefler
GNN
111
57
0
19 May 2022
Communication Bounds for Convolutional Neural Networks
Communication Bounds for Convolutional Neural Networks
An Chen
J. Demmel
Grace Dinh
Mason Haberle
Olga Holtz
49
4
0
18 Apr 2022
DISTAL: The Distributed Tensor Algebra Compiler
DISTAL: The Distributed Tensor Algebra Compiler
Rohan Yadav
A. Aiken
Fredrik Kjolstad
55
30
0
15 Mar 2022
Distributed-Memory Sparse Kernels for Machine Learning
Distributed-Memory Sparse Kernels for Machine Learning
V. Bharadwaj
A. Buluç
J. Demmel
FedML
53
11
0
15 Mar 2022
pylspack: Parallel algorithms and data structures for sketching, column
  subset selection, regression and leverage scores
pylspack: Parallel algorithms and data structures for sketching, column subset selection, regression and leverage scores
Aleksandros Sobczyk
Efstratios Gallopoulos
52
7
0
05 Mar 2022
TAMM: Tensor Algebra for Many-body Methods
TAMM: Tensor Algebra for Many-body Methods
Erdal Mutlu
Ajay Panyala
Nitin Gawande
Abhishek Bagusetty
Jinsung Kim
Karol Kowalski
Nicholas P. Bauman
B. Peng
J. Brabec
S. Krishnamoorthy
52
13
0
04 Jan 2022
Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by
  its Transpose
Efficiently Parallelizable Strassen-Based Multiplication of a Matrix by its Transpose
Viviana Arrigoni
Filippo Maggioli
A. Massini
Emanuele Rodolà
29
5
0
25 Oct 2021
Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics
Performance Analysis of CP2K Code for Ab Initio Molecular Dynamics
Dewi Yokelson
N. Tkachenko
R. Robey
Ying Wai Li
Pavel A. Dub
51
9
0
09 Sep 2021
A High-Fidelity Flow Solver for Unstructured Meshes on
  Field-Programmable Gate Arrays
A High-Fidelity Flow Solver for Unstructured Meshes on Field-Programmable Gate Arrays
Martin Karp
Artur Podobas
Tobias Kenter
Niclas Jansson
Christian Plessl
P. Schlatter
Stefano Markidis
75
8
0
27 Aug 2021
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal
  Matrix Factorizations
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal Matrix Factorizations
Grzegorz Kwa'sniewski
Marko Kabić
Tal Ben-Nun
A. Ziogas
Jens Eirik Saethre
...
Timo Schneider
Maciej Besta
Anton Kozhevnikov
J. VandeVondele
Torsten Hoefler
70
15
0
20 Aug 2021
Accelerating XOR-based Erasure Coding using Program Optimization
  Techniques
Accelerating XOR-based Erasure Coding using Program Optimization Techniques
Yuya Uezato
32
11
0
05 Aug 2021
Chimera: Efficiently Training Large-Scale Neural Networks with
  Bidirectional Pipelines
Chimera: Efficiently Training Large-Scale Neural Networks with Bidirectional Pipelines
Shigang Li
Torsten Hoefler
GNNAI4CELRM
130
137
0
14 Jul 2021
Productivity, Portability, Performance: Data-Centric Python
Productivity, Portability, Performance: Data-Centric Python
Yiheng Wang
Yao Zhang
Yanzhang Wang
Yan Wan
Jiao Wang
Zhongyuan Wu
Yuhao Yang
Bowen She
165
101
0
01 Jul 2021
COSTA: Communication-Optimal Shuffle and Transpose Algorithm with
  Process Relabeling
COSTA: Communication-Optimal Shuffle and Transpose Algorithm with Process Relabeling
Marko Kabić
S. Pintarelli
Anton Kozhevnikov
J. VandeVondele
57
3
0
11 Jun 2021
Towards Million-Server Network Simulations on Just a Laptop
Towards Million-Server Network Simulations on Just a Laptop
Maciej Besta
Marcel Schneider
Salvatore Di Girolamo
Ankit Singla
Torsten Hoefler
49
3
0
26 May 2021
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining
  Algorithms with Set Algebra
GraphMineSuite: Enabling High-Performance and Programmable Graph Mining Algorithms with Set Algebra
Maciej Besta
Zur Vonarburg-Shmaria
Yannick Schaffner
Leonardo Schwarz
Grzegorz Kwa'sniewski
...
Philipp Lindenberger
Pavel Kalvoda
Marek Konieczny
O. Mutlu
Torsten Hoefler
82
26
0
05 Mar 2021
I/O Lower Bounds for Auto-tuning of Convolutions in CNNs
I/O Lower Bounds for Auto-tuning of Convolutions in CNNs
Xiaoyang Zhang
Junmin Xiao
Guangming Tan
55
9
0
31 Dec 2020
Substream-Centric Maximum Matchings on FPGA
Substream-Centric Maximum Matchings on FPGA
Maciej Besta
Marc Fischer
Tal Ben-Nun
Dimitri Stanojevic
Johannes de Fine Licht
Torsten Hoefler
91
29
0
28 Oct 2020
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal
  LU Factorization
On the Parallel I/O Optimality of Linear Algebra Kernels: Near-Optimal LU Factorization
Grzegorz Kwa'sniewski
Tal Ben-Nun
A. Ziogas
Timo Schneider
Maciej Besta
Torsten Hoefler
68
7
0
12 Oct 2020
High-Performance Parallel Graph Coloring with Strong Guarantees on Work,
  Depth, and Quality
High-Performance Parallel Graph Coloring with Strong Guarantees on Work, Depth, and Quality
Maciej Besta
Armon Carigiet
Zur Vonarburg-Shmaria
Kacper Janda
Lukas Gianinazzi
Torsten Hoefler
97
25
0
26 Aug 2020
A parallel structured divide-and-conquer algorithm for symmetric
  tridiagonal eigenvalue problems
A parallel structured divide-and-conquer algorithm for symmetric tridiagonal eigenvalue problems
Xiangke Liao
Shengguo Li
Yutong Lu
J. Román
34
10
0
05 Aug 2020
High-Performance Routing with Multipathing and Path Diversity in
  Ethernet and HPC Networks
High-Performance Routing with Multipathing and Path Diversity in Ethernet and HPC Networks
Maciej Besta
Jens Domke
Marcel Schneider
Marek Konieczny
Salvatore Di Girolamo
Timo Schneider
Ankit Singla
Torsten Hoefler
70
5
0
07 Jul 2020
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models,
  and Systems
Practice of Streaming Processing of Dynamic Graphs: Concepts, Models, and Systems
Maciej Besta
Marc Fischer
Vasiliki Kalavri
Michael Kapralov
Torsten Hoefler
GNN
113
56
0
29 Dec 2019
Communication-Efficient Jaccard Similarity for High-Performance
  Distributed Genome Comparisons
Communication-Efficient Jaccard Similarity for High-Performance Distributed Genome Comparisons
Maciej Besta
Raghavendra Kanakagiri
Harun Mustafa
Mikhail Karasikov
Gunnar Rätsch
Torsten Hoefler
Edgar Solomonik
67
67
0
11 Nov 2019
Demystifying Graph Databases: Analysis and Taxonomy of Data
  Organization, System Designs, and Graph Queries
Demystifying Graph Databases: Analysis and Taxonomy of Data Organization, System Designs, and Graph Queries
Maciej Besta
Robert Gerstenberger
E. Peter
Marc Fischer
Michal Podstawski
Claude Barthels
Gustavo Alonso
Torsten Hoefler
GNN
100
97
0
20 Oct 2019
1