Scalable Fault-tolerant PDE Solvers

For people involved in the Mathematical Modelling and Computation program, research opportunities exist in formulation of fault-tolerant numerical schemes, implementation of fault-tolerant schemes on supercomputers, simulation of hardware failure events on ultrascale supercomputers, and application of these techniques to scientific computing.

group Group

Computational mathematics

traffic Project status

Current

Content navigation

About

Partial Differential Equations (PDEs) are prevalent in mathematical analysis and modelling of natural and human systems. The solution of PDEs on a new generation of petascale supercomputers--machines capable of performing 10¹⁵ floating point operations per second--poses two key challenges, scalability and resiliance. The scalability problem is twofold: scalability to large numbers of processing elements (PEs) or "chips," and scalability to higher dimensional systems. Resiliance is the ability of an algorithm to continue functioning in the event of hardware component failures. This is of increasing interest as the high-performance computing community approaches the exascale era (machines capable of performing 10¹⁸ floating point operations per second). Exascale platforms will comprise millions of hardware components, meaning the machine as a whole will have shorter mean-time-to-failure values than previously seen. This means an application running on an exascale machine is likely to experience a hardware failure at some point during its exection. Our focus is the formulation of numerical schemes and algorithms that implement algorithmic-based fault-tolerance (ABFT). Our ABFT schemes employ multigrid strategies such as the sparse grid combination technique. These schemes are capable of producing solutions under conditions of hardware failure, with known error bounds.

This research is funded by Fujitsu Laboratories Europe and the Australian Research Council.

Members

Supervisor

Markus Hegland

Emeritus Professor

Mathematical Sciences Institute
ANU College of Systems and Society

About

Study

Research

People

News & events

Outreach

Contacts

Scalable Fault-tolerant PDE Solvers

Content navigation

About

Members

Supervisor

Markus Hegland

Scalable Fault-tolerant PDE Solvers

Groups

Project status

Content navigation

About

Members

Supervisor