ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1708.02030
  4. Cited By
CRAFT: A library for easier application-level Checkpoint/Restart and
  Automatic Fault Tolerance

CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance

7 August 2017
Faisal Shahzad
J. Thies
Moritz Kreutzer
T. Zeiser
G. Hager
G. Wellein
ArXiv (abs)PDFHTML

Papers citing "CRAFT: A library for easier application-level Checkpoint/Restart and Automatic Fault Tolerance"

9 / 9 papers shown
Title
Designing an Adaptive Application-Level Checkpoint Management System for
  Malleable MPI Applications
Designing an Adaptive Application-Level Checkpoint Management System for Malleable MPI Applications
Jophin John
Michael Gerndt
18
0
0
08 Nov 2022
Fault-Aware Non-Collective Communication Creation and Reparation in MPI
Fault-Aware Non-Collective Communication Creation and Reparation in MPI
Roberto Rocco
G. Palermo
22
3
0
05 Sep 2022
ReStore: In-Memory REplicated STORagE for Rapid Recovery in
  Fault-Tolerant Algorithms
ReStore: In-Memory REplicated STORagE for Rapid Recovery in Fault-Tolerant Algorithms
Lukas Hübner
Demian Hespe
Peter Sanders
A. Stamatakis
40
1
0
02 Mar 2022
Checkpoint-Restart Libraries Must Become More Fault Tolerant
Checkpoint-Restart Libraries Must Become More Fault Tolerant
Anthony Skjellum
Derek Schafer
32
0
0
20 Dec 2021
MANA-2.0: A Future-Proof Design for Transparent Checkpointing of MPI at
  Scale
MANA-2.0: A Future-Proof Design for Transparent Checkpointing of MPI at Scale
Yao Xu
Zhengji Zhao
Rohan Garg
Harsh Khetawat
R. Hartman-Baker
Gene Cooperman
36
5
0
10 Dec 2021
Legio: Fault Resiliency for Embarrassingly Parallel MPI Applications
Legio: Fault Resiliency for Embarrassingly Parallel MPI Applications
Roberto Rocco
Davide Gadioli
G. Palermo
23
12
0
29 Apr 2021
Reinit++: Evaluating the Performance of Global-Restart Recovery Methods
  For MPI Fault Tolerance
Reinit++: Evaluating the Performance of Global-Restart Recovery Methods For MPI Fault Tolerance
Giorgis Georgakoudis
Luanzheng Guo
Ignacio Laguna
LRM
21
6
0
13 Feb 2021
MATCH: An MPI Fault Tolerance Benchmark Suite
MATCH: An MPI Fault Tolerance Benchmark Suite
Luanzheng Guo
Giorgis Georgakoudis
K. Parasyris
Ignacio Laguna
Dong Li
25
7
0
13 Feb 2021
Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
Resiliency in Numerical Algorithm Design for Extreme Scale Simulations
E. Agullo
Mirco Altenbernd
H. Anzt
L. Bautista-Gomez
Tommaso Benacchio
...
K. Teranishi
Samuel Thibault
Dominik Thoennes
Andreas Wagner
B. Wohlmuth
44
6
0
26 Oct 2020
1