256
v1v2 (latest)

Toward Resilient Algorithms and Applications

Fault Tolerance for HPC at eXtreme Scales Workshop (FTXS), 2013
Abstract

Over the past decade, the high performance computing community has become increasingly concerned that preserving the reliable, digital machine model will become too costly or infeasible. In this paper we discuss four approaches for developing new algorithms that are resilient to hard and soft failures.

View on arXiv
Comments on this paper