Understanding HPC Benchmark Performance on Intel Broadwell and Cascade
Lake Processors

v1v2 (latest)

Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors

9 February 2020

Johannes Hofmann

ArXiv (abs)PDF HTML

Papers citing "Understanding HPC Benchmark Performance on Intel Broadwell and Cascade Lake Processors"

7 / 7 papers shown

Title
Cache Blocking of Distributed-Memory Parallel Matrix Power Kernels Dane C. Lacey C. Alappat F. Lange G. Hager Holger Fehske G. Wellein 28 0 0 21 May 2024
Algebraic Temporal Blocking for Sparse Iterative Solvers on Multi-Core CPUs C. Alappat J. Thies G. Hager Holger Fehske G. Wellein 24 2 0 05 Sep 2023
Fast Matrix Multiplication via Compiler-only Layered Data Reorganization and Intrinsic Lowering Braedy Kuzma Ivan Korostelev J. P. L. Carvalho José Moreira Christopher Barton Guido Araujo J. N. Amaral 20 3 0 15 May 2023
Level-based Blocking for Sparse Matrices: Sparse Matrix-Power-Vector Multiplication C. Alappat G. Hager Olaf Schenk G. Wellein 25 9 0 03 May 2022
Routing brain traffic through the von Neumann bottleneck: Efficient cache usage in spiking neural network simulation code on general purpose computers Jari Pronold Jakob Jordan B. Wylie Itaru Kitayama M. Diesmann Susanne Kunkel 65 15 0 27 Sep 2021
An analytic performance model for overlapping execution of memory-bound loop kernels on multicore CPUs Ayesha Afzal G. Hager G. Wellein 24 2 0 31 Oct 2020
Performance Modeling of Streaming Kernels and Sparse Matrix-Vector Multiplication on A64FX C. Alappat Jan Laukemann Thomas Gruber G. Hager G. Wellein N. Meyer T. Wettig 16 16 0 29 Sep 2020