Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.02030
Cited By
CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance
7 August 2017
Faisal Shahzad
J. Thies
Moritz Kreutzer
T. Zeiser
G. Hager
G. Wellein
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance"
9 / 9 papers shown
Title
Designing an Adaptive Application-Level Checkpoint Management System for Malleable MPI Applications
Jophin John
Michael Gerndt
18
0
0
08 Nov 2022
Fault-Aware Non-Collective Communication Creation and Reparation in MPI
Roberto Rocco
G. Palermo
22
3
0
05 Sep 2022
ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms
Lukas Hübner
Demian Hespe
Peter Sanders
A. Stamatakis
40
1
0
02 Mar 2022
Checkpoint-Restart Libraries Must Become More Fault Tolerant
Anthony Skjellum
Derek Schafer
32
0
0
20 Dec 2021
MANA-2.0: A Future-Proof Design for Transparent Checkpointing of MPI at Scale
Yao Xu
Zhengji Zhao
Rohan Garg
Harsh Khetawat
R. Hartman-Baker
Gene Cooperman
36
5
0
10 Dec 2021
Legio: Fault Resiliency for Embarrassingly Parallel MPI Applications
Roberto Rocco
Davide Gadioli
G. Palermo
23
12
0
29 Apr 2021
Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance
Giorgis Georgakoudis
Luanzheng Guo
Ignacio Laguna
LRM
21
6
0
13 Feb 2021
MATCH: An MPI Fault Tolerance Benchmark Suite
Luanzheng Guo
Giorgis Georgakoudis
K. Parasyris
Ignacio Laguna
Dong Li
25
7
0
13 Feb 2021
Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
E. Agullo
Mirco Altenbernd
H. Anzt
L. Bautista-Gomez
Tommaso Benacchio
...
K. Teranishi
Samuel Thibault
Dominik Thoennes
Andreas Wagner
B. Wohlmuth
44
6
0
26 Oct 2020
1