Main innovation: We develop extensions to MPI (and also beyond MPI) which enable dynamic resource utilization. This can then be exploited by adaptive PinT and also beyond PinT methods. Our approach avoids dead ends of existing approaches and hence has the chance to pave the way to make dynamic resource utilization a success on future HPC centers with MPI.
We have made significant progress towards establishing MPI extensions supporting dynamic resource utilisation in an efficient way based on MPI Sessions. TIME-X is currently the main driver behind this progress in the MPI Session working group, targeting the genericity of these interfaces also beyond TIME-X. This already resulted in two publications: an emulator of these extensions as a proof-of-concept and a concrete implementation based on Open MPI and PMIx. We continue working on a more robust and flexible version of these interfaces to ensure that they are future-proof.
Existing approaches of dynamic resource utilisation (aka. malleability if applied to entire jobs) for traditional HPC applications were stuck so far in a dead end due to targeting solely toy problems (e.g., standard SPEC benchmarks) or highly specialised problems (e.g., just dynamic adaptive mesh refinement and coarsening). Due to the highly interdisciplinary experience in TIME-X (covering in-depth application knowledge, algorithmic patterns and applied mathematics understanding), we designed MPI extensions in a way supporting all kinds of dynamic resource changes (also beyond parallel-in-time, for coupled weather-ocean simulations, pre-/postprocessing and even for machine learning frameworks). From a simplified view, one new major concept is the definition of set operations on process sets that are application-driven to express what should happen with added/removed resources. This paves the way for various kinds of parallel dynamic resource patterns. A particular example including different phases of this transition is given in the following picture:
While this design is motivated by the diversity of dynamic resource requirements of HPC applications, we also considered the embedding of these dynamic MPI extensions into the HPC software stack. To this end, we are actively working on standardized interactions with the Resource Management Software of HPC systems using PMIx. An example of such a software stack is shown in the figure below.
We have developed a prototype implementation of the dynamic MPI extensions based on Open MPI, OpenPMIx and PRRTE, which is located in a public Gitlab repository. We are currently working on using this prototype for an adaptive implementation of a PinT method (libPFASST) as well as on further improvements of the prototype.
Our work has already gained strong interest also in other EuroHPC projects (ADMIRE, DEEP-SEA, REGALE) where various current collaborations exist and future ones are planned around this and other topics. Further, this also led to shared staffing between TIME-X and DEEP-SEA to ensure developments that match between the application/system-driven perspective in TIME-X and the system-driven perspective in DEEP-SEA.
- J. Fecht, M. Schreiber, M. Schulz, H. Prichard, D.J. Holmes (2022). “An Emulation Layer for Dynamic Resources with MPI Sessions”, HPCMALL workshop at ISC 2022
- D. Huber, M. Streubel, I. Comprés, M. Schulz, M. Schreiber, H. Pritchard (2022). “Towards Dynamic Resource Management with MPI Sessions and PMIx”, ACM