Categories
highlight job stage

[JOB] M2 Internship 👩‍💻🧑‍💻- Elastic loadbalancing between exascale simulationand in situ analysis

Ce sujet de stage est proposé dans le cadre du projet Exa-DoST du PEPR NumPEx, qui regroupe l’ensemble de la communauté de recherche française du CEA, du CNRS, de l’Inria, des universités et des écoles d’ingénieurs autour des thématiques de gestion de données HPC. Exa DoST vise à concevoir les méthodes et des outils qui permettront notamment de tirer parti du premier supercalculateur exascale français qui sera installé au CEA/TGCC à l’horizon 2025.
Le stage se déroulera à la Maison de la Simulation (MdlS), un laboratoire HPC commun au CEA, au CNRS, à l’UVSQ et à l’université Paris-Saclay. Le stage sera co-encadré avec une équipe de la MdlS et du CEA.

Encadrants

Benoît Martin, MdlS, bmartin@cea.fr, tel: 01 69 08 87 71
Julien Bigot, MdlS, julien.bigot@cea.fr, tel: 01 69 08 01 75
Laurent Colombet, DAM Ile de France, laurent.colombet@cea.fr, tel: 01 69 26 43 19

Contexte

Les trois premiers supercalculateurs exascales viennent d’être déployés. L’explosion de la puissance de calcul avec l’entrée dans cette nouvelle ère promet des simulations produisant des données à des échelles jamais envisagées auparavant. L’analyse de ces masses de données nécessite l’utilisation de méthodes statistiques ou d’IA de plus en plus avancées.
Historiquement, ces analyses étaient exécutées post-hoc, les données brutes étant stockées sur disque au cours de la simulation et les analyses exécutées par la suite. Depuis plusieurs années, l’augmentation de la performance du stockage (rapidité d’accès et volume) ne suit pas celle exponentielle du calcul ; un gouffre se creuse et le disque devient le nouveau goulot d’étranglement de performances. Pour contourner cette limitation, une nouvelle approche consiste à analyser les données in situ, pendant leur production, pour ne stocker que les résultats de ces analyses.

Cette méthode rend accessible l’exécution de simulations à des échelles extrêmes, mais elle exacerbe les problèmes d’équilibrage de charge. Historiquement, il s’agissait d’adapter la vitesse de simulation aux ressources de calcul disponibles, et donc faire varier le débit de génération de données. Avec l’analyse in situ, il devient, en plus, nécessaire de s’adapter à des quantités de données à analyser qui varient au cours du temps, avec un coût d’analyse qui peut lui aussi varier. Sans solution spécifique, les ressources matérielles à provisionner pour gérer les pics de besoins de l’analyse risquent d’exploser inutilement en laissant ces ressources inutilisées hors pics. Il est donc impératif de concevoir et de mettre en œuvre des propositions d’adaptation et d’équilibrage de charge innovantes pour rendre l’approche viable et pouvoir effectivement tirer parti des supercalculateurs exascale.

Dans le domaine du cloud computing, la notion d’élasticité répond à ce besoin. Il s’agit de provisionner les ressources matérielles dynamiquement au cours de l’exécution, en fonction des besoins. Des travaux existants ont tenté d’adapter ce concept au calcul intensif, mais ils se sont confrontés à des problèmes conceptuels liés à la moindre flexibilité des plateformes et à la plus forte adhérence au matériel nécessaire pour en tirer les performances maximales.

L’arrivée de nouvelles formes de stockage éphémères sur les supercalculateurs, la flexibilité apportée par la désagrégation, et l’utilisation d’intergiciels issus du cloud computing pour l’analyse in situ, rebattent les cartes et ouvrent de nouvelles possibilités.

Sujet

L’objectif de ce stage est de concevoir et de mettre en œuvre une solution permettant de déclencher dynamiquement des analyses in situ avancées en fonction des résultats d’analyses préliminaires.
Ce projet vise à simuler et à gérer un flux de travail complexe. Vous serez amené à explorer :

  1. Les analyses de conditionnent dynamiquement : par exemple, un premier niveau d’analyse détecte un événement critique, déclenchant une analyse plus poussée.
  2. L’adaptation dynamique des ressources : ajouter ou redimensionner les ressources dédiées à l’analyse en fonction des besoins émergents.
  3. L’interaction entre la simulation et l’analyse : intégrer un mécanisme de contrôle de flux, permettant, par exemple, de ralentir ou mettre en pause la simulation lorsque l’analyse ne suit pas le rythme des données produites.

La solution s’appuiera sur l’approche d’analyse in situ Deisa[2] (conçue à la Maison de la Simulation), utilisant l’outil cloud Dask[3,4], et sera intégrée dans des environnements de calcul intensif. Elle sera validée sur des supercalculateurs de classe mondiale avec des applications de simulation telles que Gysela ou Coddex[5].
Ce travail permettra de créer une charge de travail complexe nécessitant des mécanismes avancés d’équilibrage de charge et de gestion des ressources, posant les bases d’une réflexion plus large.

Déroulement prévu

Au début du stage, vous effectuerez une phase d’analyse sur l’approche Deisa, qui exploite l’outil Dask pour offrir une grande flexibilité dans l’analyse des données.
Ensuite, vous concevrez un mécanisme permettant de déclencher automatiquement des analyses avancées en fonction des résultats d’analyses préliminaires. Vous intégrerez ces fonctionnalités dans Deisa, en vous appuyant sur Dask pour gérer les dépendances entre tâches et orchestrer leur exécution.
Enfin, vous ajouterez des fonctionnalités permettant d’ajuster dynamiquement les ressources affectées à l’analyse en fonction de la charge (par exemple, à l’aide d’un déploiement adaptatif [7]) et un mécanisme permettant de réguler la simulation en cas de saturation de ressources dédiée à l’analyse, via un contrôle de flux (backpressure).
Le stage pourra également déboucher sur une thèse de 3 ans. Celle-ci visera à approfondir les concepts abordés durant le stage et à explorer de nouvelles approches pour améliorer l’élasticité des systèmes d’analyse in situ. En particulier, la thèse se concentrera sur l’intégration dynamique de nœuds Dask en cours de simulation, une fonctionnalité actuellement non prise en charge par l’approche Deisa. Cette avancée permettrait de répondre encore plus efficacement aux variations des besoins en ressources, renforçant ainsi la flexibilité et la performance des supercalculateurs dans des contextes de simulation et d’analyse à très grande échelle.

Compétences attendues

Une bonne connaissance des systèmes distribués
De bonnes compétences en programmation (Python, C/C++)
Très bonnes capacités de communication en anglais, à l’oral et à l’écrit
Ouverture d’esprit, fortes capacités d’intégration et esprit d’équipe.

Bibliographie

[1] https://www.reactivemanifesto.org
[2] A. Gueroudji, Distributed Task-Based In Situ Data Analytics for High-Performance Simulations, PhD Thesis, https://www.theses.fr/2023GRALM019
[3] dask-ml 0.1 documentation – dask ml.decomposition.IncrementalPCA. URL modules/generated/dask_ml.decomposition.IncrementalPCA.html.
[4] Dask.distributed — Dask.distributed 2022.10.2 documentation, . URL https://distributed.dask.org/en/stable/.
[5] P. Lafourcade, Modélisation Multiéchelle du Comportement Mécanique d’un Matériau Energétique : Le TATB, PhD Thesis, https://www.theses.fr/fr/2018ENAM0030
[6] E. Dirand, L. Colombet, B. Raffin, “TINS: A Task-Based Dynamic Helper Core Strategy for In Situ Analytics”, in Proceedings of Asian Conference on Supercomputing Frontiers, Singapore 2018.
[7] https://docs.dask.org/en/latest/adaptive.html

Categories
job thesis

[JOB] PhD – Python Data Processing on Supercomputers for Large Parallel Numerical Simulations

Contact

Yushan WANG (yushan.wang@cea.fr)

Bruno RAFFIN (bruno.raffin@inria.fr)

Context

The field of high-performance computing has reached a new milestone, with the world’s most powerful supercomputers exceeding the exaflop threshold. These machines will make it possible to process unprecedented quantities of data, which can be used to simulate complex phenomena with superior precision in a wide range of application fields: astrophysics, particle physics, healthcare, genomics, etc. In France, the installation of the first exaflop-scale supercomputer is scheduled for 2025. Leading members of the French scientific community in the field of high-performance computing (HPC) have joined forces within the PEPR NumPEx program- (https://numpex.irisa.fr) to carry out research aimed at contributing to the design and implementation of the machine’s software infrastructure. As part of this program, the Exa-DoST project focuses on data management challenges. This thesis will take place within this framework.


Without a significant change in practices, the increased computing capacity of the next generation of computers will lead to an explosion in the volume of data produced by numerical simulations. Managing this data, from production to analysis, is a major challenge.


The use of simulation results is based on a well-established calculation-storage-calculation protocol. The difference in capacity between computers and file systems makes it inevitable that the latter will be clogged. For instance, the Gysela code in production mode can produce up to 5TB of data per iteration. It is obvious that storing 5TB of data is not feasible at high frequency. What’s more, loading this quantity of data for later analysis and visualization is also a difficult task. To bypass this difficulty, we choose to rely on the in-situ data analysis approach.


In situ consists of coupling the parallel simulation code, Gysela, for instance, with a data analytics code that processes the data online as soon as they are produced. In situ enables reducing the amount of data to write to disk, limiting the pressure on the file system. This is a mandatory approach to run massive simulations like Gysela on the latest Exascale supercomputers.

We developed an in situ data processing approach called Deisa, relying on Dask, a Python environment for distributed tasks. Dask defines tasks that are executed asynchronously on workers once their input data are available. The user defines a graph of tasks to be executed. This graph is then forwarded to the Dask scheduler. The scheduler is in charge of (1) optimizing the task graph and (2) distributing the tasks for execution to the different workers according to a scheduling algorithm aiming at minimizing the graph execution time.


Deisa extends Dask, so it becomes possible to couple an MPI-based parallel simulation code with Dask. Deisa enables the simulation code to directly send newly produced data into the worker memories, notify the Dask scheduler that these data are available for analysis, and that associated tasks can then be scheduled for execution.

Compared to previous in situ approaches that are mainly MPI-based, our approach relying on Python tasks makes for a good tradeoff between programming ease and runtime performance.


The goal of this PhD work is to investigate solutions to:

  • Improve task placement and thus performance enabling tasks to be scheduled in process (into the simulation processes), in situ (running on external processes but on the same compute nodes that also run the simulation code), and in transit (on dedicated nodes different from the simulation nodes). Running closer to the simulation reduces the need for data movements but can potentially steal resources (CPU, GPU, network, memory, cache) from the simulation and slow it down. Dask task graph optimization is a good starting point to develop such approaches.

  • Enable more diverse and flexible data processing patterns for Dask in situ:
  1. data processing tasks are triggered when detecting some specific events in the data;
  2. changes to some simulation internal parameters during runtime as a result ofcertain analytics tasks;
  3. enabling task graphs combining classical analytics with deep neural networks-based analysis.

Problematic

When discussing in-situ data analysis, two primary techniques are often highlighted: in-transit analysis and in-process analysis.


In-transit analysis involves examining data while it is being transferred between systems or across various components of a distributed architecture. For instance, in large-scale simulations or scientific experiments, data is typically generated on one system (such as a supercomputer) and needs to be sent to another system for storage or further analysis.
Rather than waiting for the data to reach its final destination, in-transit analysis allows for computations to be performed on the data as it moves. This approach significantly reduces overall processing time.
In contrast, in-process analysis entails analyzing data during its generation or processing by the application. Instead of waiting for an entire simulation or data generation task to finish, this technique enables concurrent processing of data throughout the ongoing task, such as during simulation steps in a scientific application. By doing so, the burden of post-processing
is alleviated, as computational tasks are distributed over time.


To illustrate these techniques, consider the Gysela code. Our goal is to integrate both in-transit and in-process analyses to enhance data analytics while minimizing data transfer between systems. A common diagnostic performed on Gysela data is the global aggregation of certain fields across the entire domain. This global operation can be divided into a subdomain reduction followed by a reduced global reduction. By executing the initial
reduction directly on the process where the data is generated, we can significantly decrease the volume of data transferred. This, in turn, alleviates the load on the parallel file system.


However, determining which reductions should be performed on specific resources presents a challenge, especially since we often lack prior knowledge about the types of diagnostics that will be required. This highlights the concept of co-scheduling. In this context, co-scheduling refers to the coordinated execution of in-transit and in-process data analysis tasks to optimize resource efficiency and minimize data movement latency. By aligning the scheduling of these two processes, the system can ensure more effective utilization of resources, such as network bandwidth, CPU, and memory. This approach is particularly vital for large-scale applications, where traditional methods of moving and analyzing massive
datasets can lead to significant bottlenecks.

Mission

The candidate will begin the thesis by conducting a comprehensive study of the state-of-the-art in relevant areas, focusing on in-situ, in-transit, and in-process data analysis.
Early on, they will gain proficiency in using PDI Deisa and familiarize themselves with the Gysela code.

To dive into the core subject of the thesis, the candidate will examine how to separate local data reduction from the overall workflow. They will analyze the task graph generated by Dask, the underlying library of Deisa, and conduct a static analysis to determine which tasks should be executed in-process. Applying graph theory will be crucial in this stage to identify
the appropriate tasks.

Once the local tasks are defined, the candidate will implement routines within the PDI Deisa plugin to handle these local operations in the same process as the simulation. In the final phase, they will expose the locally reduced data to Deisa’s dedicated I/O processes using remote procedure calls, facilitating the aggregation of data for final reduction.

Additionally, the candidate will investigate solutions to automate the above processes, ensuring that compute resources are efficiently scheduled based on the workload. The ultimate goal will be to optimize the entire workflow, improving performance and resource management.

Main activities

The candidate will undertake the thesis work by thoroughly studying existing research and developing innovative solutions to efficiently manage the integration of in-transit and in-process data analysis. This research will address critical challenges in optimizing data workflows, and the novel solutions devised are expected to significantly enhance performance in large-scale computing environments. The outcomes of this work will be submitted for publication in top-tier journals and presented at prestigious conferences within the high-performance computing (HPC) community.

As part of the Exa-DoST project within the NumPEx PEPR initiative, the candidate will have privileged access to cutting-edge, large-scale computing infrastructure, enabling robust experimentation and testing. The framework developed will be rigorously evaluated on large-scale applications in close collaboration with renowned research entities such as CEA/DAM and/or CEA/DES. These collaborations will provide the candidate with opportunities to work on real-world HPC challenges at the frontier of scientific research.

The candidate will be based at Maison de la Simulation, working closely with interdisciplinary teams of experts in HPC and advanced simulation from Inria Grenoble, ensuring a dynamic and supportive research environment. This collaborative setting will foster innovation and ensure that the research contributes to the state of the art in both academic and industrial contexts.

Technical skills

  • An excellent Master’s degree in computer science or equivalent
  • Strong knowledge of distributed systems
  • Knowledge of storage and (distributed) file systems
  • Ability and motivation to conduct high-quality research, including publishing the results in relevant reviews
  • Strong programming skills (Python, C/C++)
  • Working experience in the areas of HPC and Big Data management is an advantage
  • Very good communication skills in oral and written English
  • Open-mindedness, strong integration skills, and team spirit

Benefits

  • Subsidized meals
  • Up to 75% reimbursed public transport
  • Possibility of teleworking and flexible working hours
  • Professional equipment available (videoconferencing, loan of computer equipment, etc.)
  • Social, cultural, and sports benefits
  • Access to professional training
  • Social security
  • Up to 9 weeks of paid leave

References

  1. Dask – https://www.dask.org/
  2. Deisa Paper: Dask-enabled in situ analytics. Amal Gueroudji, Julien Bigot, Bruno Raffin. Hipc 2021. https://hal.inria.fr/hal-03509198v1
  3. Deisa Paper: Dask-Extended External Tasks for HPC/ML In Transit Workflows, Amal Gueroudji, Julien Bigot, Bruno Raffin, Robert Ross. Work workshop at Supercomputing 23. https://hal.science/hal-04409157v1
  4. Deisa Code: https://github.com/pdidev/deisa
  5. Ray – https://github.com/ray-project/ray
  6. Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O. Matthieu Dorier , Gabriel Antoniu , Franck Cappello, Marc Snir , Leigh Orf. IEEE Cluster 2012. https://inria.hal.science/hal-00715252

Annexe

Categories
highlight job stage

[JOB] Internship 👩‍💻🧑‍💻- Python Data Processing on Supercomputers for Large Parallel Numerical Simulations

Contact

Yushan WANG (yushan.wang@cea.fr)

Bruno RAFFIN (bruno.raffin@inria.fr)

Context

The field of high-performance computing has reached a new milestone, with the world’s most powerful supercomputers exceeding the exaflop threshold. These machines will make it possible to process unprecedented quantities of data, which can be used to simulate complex phenomena with superior precision in a wide range of application fields: astrophysics, particle physics, healthcare, genomics, etc. In France, the installation of the first exaflop-scale supercomputer is scheduled for 2025. Leading members of the French scientific community in the field of high-performance computing (HPC) have joined forces within the PEPR NumPEx program (https://numpex.irisa.fr) to carry out research aimed at contributing to the design and implementation of the machine’s software infrastructure. As part of this program, the Exa-DoST project focuses on data management challenges. This thesis will take place within this framework.

Without a significant change in practices, the increased computing capacity of the next generation of computers will lead to an explosion in the volume of data produced by numerical simulations. Managing this data, from production to analysis, is a major challenge.

The use of simulation results is based on a well-established calculation-storage-calculation protocol. The difference in capacity between computers and file systems makes it inevitable that the latter will be clogged. For instance, the Gysela code in production mode can produce up to 5TB of data per iteration. It is obvious that storing 5TB of data is not feasible at high frequency. What’s more, loading this quantity of data for later analysis and visualization is also a difficult task. To bypass this difficulty, we choose to rely on the in-situ data analysis approach.

In situ consists of coupling the parallel simulation code, Gysela, for instance, with a data analytics code that processes the data online as soon as they are produced. In situ enables reducing the amount of data to write to disk, limiting the pressure on the file system. This is a mandatory approach to run massive simulations like Gysela on the latest Exascale supercomputers.

We developed an in situ data processing approach called Deisa, relying on Dask, a Python environment for distributed tasks. Dask defines tasks that are executed asynchronously on workers once their input data are available. The user defines a graph of tasks to be executed. This graph is then forwarded to the Dask scheduler. The scheduler is in charge of (1) optimizing the task graph and (2) distributing the tasks for execution to the different workers according to a scheduling algorithm aiming at minimizing the graph execution time.

Deisa extends Dask, so it becomes possible to couple an MPI-based parallel simulation code with Dask. Deisa enables the simulation code to directly send newly produced data into the worker memories, notify the Dask scheduler that these data are available for analysis, and that associated tasks can then be scheduled for execution.

Compared to previous in situ approaches that are mainly MPI-based, our approach relying on Python tasks makes for a good tradeoff between programming ease and runtime performance.

Problematic

When discussing in-situ data analysis, two primary techniques are often highlighted: in-transit analysis and in-process analysis.

In-transit analysis involves examining data while it is being transferred between systems or across various components of a distributed architecture. For instance, in large-scale simulations or scientific experiments, data is typically generated on one system (such as a supercomputer) and needs to be sent to another system for storage or further analysis.
Rather than waiting for the data to reach its final destination, in-transit analysis allows for computations to be performed on the data as it moves. This approach significantly reduces overall processing time.

In contrast, in-process analysis entails analyzing data during its generation or processing by the application. Instead of waiting for an entire simulation or data generation task to finish, this technique enables concurrent processing of data throughout the ongoing task, such as during simulation steps in a scientific application. By doing so, the burden of post-processing is alleviated, as computational tasks are distributed over time.

To illustrate these techniques, consider the Gysela code. Our goal is to integrate both in-transit and in-process analyses to enhance data analytics while minimizing data transfer between systems. A common diagnostic performed on Gysela data is the global aggregation of certain fields across the entire domain. This global operation can be divided into a subdomain reduction followed by a reduced global reduction. By executing the initial reduction directly on the process where the data is generated, we can significantly decrease the volume of data transferred. This, in turn, alleviates the load on the parallel file system.

However, determining which reductions should be performed on specific resources presents a challenge, especially since we often lack prior knowledge about the types of diagnostics that will be required. This highlights the concept of co-scheduling. In this context, co-scheduling refers to the coordinated execution of in-transit and in-process data analysis tasks to optimize resource efficiency and minimize data movement latency. By aligning the scheduling of these two processes, the system can ensure more effective utilization of resources, such as network bandwidth, CPU, and memory. This approach is particularly vital for large-scale applications, where traditional methods of moving and analyzing massive datasets can lead to significant bottlenecks.

Mission

Before putting in place a solution to automatically manage the separation of local reductions from the workflow, we need to check whether or not the overall performance can be improved by executing the local reductions in-process. The candidate will consider an artificially generated workflow in which one has the possibility to isolate local operations from the global task graph. Next, he/she needs to manually assign those local operations to be performed on the same process as the application. The local results will then be aggregated to dedicated resources for the final results. The candidate will be in charge of the performance evaluation of the whole workflow.

The successful completion of this internship could lead to a 3-year thesis, during which you will further explore the concepts already covered and conduct research work, notably the automation of co-scheduling in-situ and in-process tasks.

Main activities

After studying the state of the art, getting to grips with the architecture of PDI and Deisa, and getting familiar with the Dask environment, the candidate will study, propose, and develop innovative solutions, which he or she will publish in the best journals and conferences in the field. Within the Exa-DoST project of the NumPEx PEPR, the candidate will have privileged access to very large-scale computers for experiments. The framework developed will be tested on large-scale applications with close collaboration with CEA/DAM or/and CEA/DES.
The candidate will be based at Maison de la Simulation, in close collaboration with the teams of specialists in high-performance computing and simulation at Inria Grenoble.

Technical skills

  • An excellent Master degree in computer science or equivalent
  • Strong knowledge of distributed systems
  • Knowledge on storage and (distributed) file systems
  • Ability and motivation to conduct high-quality research, including publishing the results in relevant reviews
  • Strong programming skills (Python, C/C++)
  • Working experience in the areas of HPC and Big Data management is an advantage
  • Very good communication skills in oral and written English
  • Open-mindedness, strong integration skills and team spirit

References

  1. Dask – https://www.dask.org/
  2. Deisa Paper: Dask-enabled in situ analytics. Amal Gueroudji, Julien Bigot, Bruno Raffin. Hipc 2021. https://hal.inria.fr/hal-03509198v1
  3. Deisa Paper: Dask-Extended External Tasks for HPC/ML In Transit Workflows, Amal Gueroudji, Julien Bigot, Bruno Raffin, Robert Ross. Work workshop at Supercomputing 23. https://hal.science/hal-04409157v1
  4. Deisa Code: https://github.com/pdidev/deisa
  5. Ray – https://github.com/ray-project/ray
  6. Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O. Matthieu Dorier , Gabriel Antoniu , Franck Cappello, Marc Snir , Leigh Orf. IEEE Cluster 2012. https://inria.hal.science/hal-00715252