v1v2 (latest)
Toward Resilient Algorithms and Applications
Fault Tolerance for HPC at eXtreme Scales Workshop (FTXS), 2013
Abstract
Over the past decade, the high performance computing community has become increasingly concerned that preserving the reliable, digital machine model will become too costly or infeasible. In this paper we discuss four approaches for developing new algorithms that are resilient to hard and soft failures.
View on arXivComments on this paper
