Fault tolerant algorithms

Supervisor contact

Related areas

As we move into exascale computing, and beyond, the chance of a fault occurring in the system increases. Traditional approaches to building resilience into the system, such as check-pointing, may become too expensive. An alternative approach is to exploit any structure within the algorithms themselves that may allow us to recover from a fault. This project will require a student to combine skills from both the mathematical and computer sciences.

