What does fault tolerant Deep Learning need from MPI?

What does fault tolerant Deep Learning need from MPI?

Papers citing "What does fault tolerant Deep Learning need from MPI?"