Categories
highlight job stage

[JOB] M2 Internship 👩‍💻🧑‍💻- Elastic loadbalancing between exascale simulationand in situ analysis

Ce sujet de stage est proposé dans le cadre du projet Exa-DoST du PEPR NumPEx, qui regroupe l’ensemble de la communauté de recherche française du CEA, du CNRS, de l’Inria, des universités et des écoles d’ingénieurs autour des thématiques de gestion de données HPC. Exa DoST vise à concevoir les méthodes et des outils qui permettront notamment de tirer parti du premier supercalculateur exascale français qui sera installé au CEA/TGCC à l’horizon 2025.
Le stage se déroulera à la Maison de la Simulation (MdlS), un laboratoire HPC commun au CEA, au CNRS, à l’UVSQ et à l’université Paris-Saclay. Le stage sera co-encadré avec une équipe de la MdlS et du CEA.

Encadrants

Benoît Martin, MdlS, bmartin@cea.fr, tel: 01 69 08 87 71
Julien Bigot, MdlS, julien.bigot@cea.fr, tel: 01 69 08 01 75
Laurent Colombet, DAM Ile de France, laurent.colombet@cea.fr, tel: 01 69 26 43 19

Contexte

Les trois premiers supercalculateurs exascales viennent d’être déployés. L’explosion de la puissance de calcul avec l’entrée dans cette nouvelle ère promet des simulations produisant des données à des échelles jamais envisagées auparavant. L’analyse de ces masses de données nécessite l’utilisation de méthodes statistiques ou d’IA de plus en plus avancées.
Historiquement, ces analyses étaient exécutées post-hoc, les données brutes étant stockées sur disque au cours de la simulation et les analyses exécutées par la suite. Depuis plusieurs années, l’augmentation de la performance du stockage (rapidité d’accès et volume) ne suit pas celle exponentielle du calcul ; un gouffre se creuse et le disque devient le nouveau goulot d’étranglement de performances. Pour contourner cette limitation, une nouvelle approche consiste à analyser les données in situ, pendant leur production, pour ne stocker que les résultats de ces analyses.

Cette méthode rend accessible l’exécution de simulations à des échelles extrêmes, mais elle exacerbe les problèmes d’équilibrage de charge. Historiquement, il s’agissait d’adapter la vitesse de simulation aux ressources de calcul disponibles, et donc faire varier le débit de génération de données. Avec l’analyse in situ, il devient, en plus, nécessaire de s’adapter à des quantités de données à analyser qui varient au cours du temps, avec un coût d’analyse qui peut lui aussi varier. Sans solution spécifique, les ressources matérielles à provisionner pour gérer les pics de besoins de l’analyse risquent d’exploser inutilement en laissant ces ressources inutilisées hors pics. Il est donc impératif de concevoir et de mettre en œuvre des propositions d’adaptation et d’équilibrage de charge innovantes pour rendre l’approche viable et pouvoir effectivement tirer parti des supercalculateurs exascale.

Dans le domaine du cloud computing, la notion d’élasticité répond à ce besoin. Il s’agit de provisionner les ressources matérielles dynamiquement au cours de l’exécution, en fonction des besoins. Des travaux existants ont tenté d’adapter ce concept au calcul intensif, mais ils se sont confrontés à des problèmes conceptuels liés à la moindre flexibilité des plateformes et à la plus forte adhérence au matériel nécessaire pour en tirer les performances maximales.

L’arrivée de nouvelles formes de stockage éphémères sur les supercalculateurs, la flexibilité apportée par la désagrégation, et l’utilisation d’intergiciels issus du cloud computing pour l’analyse in situ, rebattent les cartes et ouvrent de nouvelles possibilités.

Sujet

L’objectif de ce stage est de concevoir et de mettre en œuvre une solution permettant de déclencher dynamiquement des analyses in situ avancées en fonction des résultats d’analyses préliminaires.
Ce projet vise à simuler et à gérer un flux de travail complexe. Vous serez amené à explorer :

  1. Les analyses de conditionnent dynamiquement : par exemple, un premier niveau d’analyse détecte un événement critique, déclenchant une analyse plus poussée.
  2. L’adaptation dynamique des ressources : ajouter ou redimensionner les ressources dédiées à l’analyse en fonction des besoins émergents.
  3. L’interaction entre la simulation et l’analyse : intégrer un mécanisme de contrôle de flux, permettant, par exemple, de ralentir ou mettre en pause la simulation lorsque l’analyse ne suit pas le rythme des données produites.

La solution s’appuiera sur l’approche d’analyse in situ Deisa[2] (conçue à la Maison de la Simulation), utilisant l’outil cloud Dask[3,4], et sera intégrée dans des environnements de calcul intensif. Elle sera validée sur des supercalculateurs de classe mondiale avec des applications de simulation telles que Gysela ou Coddex[5].
Ce travail permettra de créer une charge de travail complexe nécessitant des mécanismes avancés d’équilibrage de charge et de gestion des ressources, posant les bases d’une réflexion plus large.

Déroulement prévu

Au début du stage, vous effectuerez une phase d’analyse sur l’approche Deisa, qui exploite l’outil Dask pour offrir une grande flexibilité dans l’analyse des données.
Ensuite, vous concevrez un mécanisme permettant de déclencher automatiquement des analyses avancées en fonction des résultats d’analyses préliminaires. Vous intégrerez ces fonctionnalités dans Deisa, en vous appuyant sur Dask pour gérer les dépendances entre tâches et orchestrer leur exécution.
Enfin, vous ajouterez des fonctionnalités permettant d’ajuster dynamiquement les ressources affectées à l’analyse en fonction de la charge (par exemple, à l’aide d’un déploiement adaptatif [7]) et un mécanisme permettant de réguler la simulation en cas de saturation de ressources dédiée à l’analyse, via un contrôle de flux (backpressure).
Le stage pourra également déboucher sur une thèse de 3 ans. Celle-ci visera à approfondir les concepts abordés durant le stage et à explorer de nouvelles approches pour améliorer l’élasticité des systèmes d’analyse in situ. En particulier, la thèse se concentrera sur l’intégration dynamique de nœuds Dask en cours de simulation, une fonctionnalité actuellement non prise en charge par l’approche Deisa. Cette avancée permettrait de répondre encore plus efficacement aux variations des besoins en ressources, renforçant ainsi la flexibilité et la performance des supercalculateurs dans des contextes de simulation et d’analyse à très grande échelle.

Compétences attendues

Une bonne connaissance des systèmes distribués
De bonnes compétences en programmation (Python, C/C++)
Très bonnes capacités de communication en anglais, à l’oral et à l’écrit
Ouverture d’esprit, fortes capacités d’intégration et esprit d’équipe.

Bibliographie

[1] https://www.reactivemanifesto.org
[2] A. Gueroudji, Distributed Task-Based In Situ Data Analytics for High-Performance Simulations, PhD Thesis, https://www.theses.fr/2023GRALM019
[3] dask-ml 0.1 documentation – dask ml.decomposition.IncrementalPCA. URL modules/generated/dask_ml.decomposition.IncrementalPCA.html.
[4] Dask.distributed — Dask.distributed 2022.10.2 documentation, . URL https://distributed.dask.org/en/stable/.
[5] P. Lafourcade, Modélisation Multiéchelle du Comportement Mécanique d’un Matériau Energétique : Le TATB, PhD Thesis, https://www.theses.fr/fr/2018ENAM0030
[6] E. Dirand, L. Colombet, B. Raffin, “TINS: A Task-Based Dynamic Helper Core Strategy for In Situ Analytics”, in Proceedings of Asian Conference on Supercomputing Frontiers, Singapore 2018.
[7] https://docs.dask.org/en/latest/adaptive.html

Categories
highlight job stage

[JOB] Internship 👩‍💻🧑‍💻- Python Data Processing on Supercomputers for Large Parallel Numerical Simulations

Contact

Yushan WANG (yushan.wang@cea.fr)

Bruno RAFFIN (bruno.raffin@inria.fr)

Context

The field of high-performance computing has reached a new milestone, with the world’s most powerful supercomputers exceeding the exaflop threshold. These machines will make it possible to process unprecedented quantities of data, which can be used to simulate complex phenomena with superior precision in a wide range of application fields: astrophysics, particle physics, healthcare, genomics, etc. In France, the installation of the first exaflop-scale supercomputer is scheduled for 2025. Leading members of the French scientific community in the field of high-performance computing (HPC) have joined forces within the PEPR NumPEx program (https://numpex.irisa.fr) to carry out research aimed at contributing to the design and implementation of the machine’s software infrastructure. As part of this program, the Exa-DoST project focuses on data management challenges. This thesis will take place within this framework.

Without a significant change in practices, the increased computing capacity of the next generation of computers will lead to an explosion in the volume of data produced by numerical simulations. Managing this data, from production to analysis, is a major challenge.

The use of simulation results is based on a well-established calculation-storage-calculation protocol. The difference in capacity between computers and file systems makes it inevitable that the latter will be clogged. For instance, the Gysela code in production mode can produce up to 5TB of data per iteration. It is obvious that storing 5TB of data is not feasible at high frequency. What’s more, loading this quantity of data for later analysis and visualization is also a difficult task. To bypass this difficulty, we choose to rely on the in-situ data analysis approach.

In situ consists of coupling the parallel simulation code, Gysela, for instance, with a data analytics code that processes the data online as soon as they are produced. In situ enables reducing the amount of data to write to disk, limiting the pressure on the file system. This is a mandatory approach to run massive simulations like Gysela on the latest Exascale supercomputers.

We developed an in situ data processing approach called Deisa, relying on Dask, a Python environment for distributed tasks. Dask defines tasks that are executed asynchronously on workers once their input data are available. The user defines a graph of tasks to be executed. This graph is then forwarded to the Dask scheduler. The scheduler is in charge of (1) optimizing the task graph and (2) distributing the tasks for execution to the different workers according to a scheduling algorithm aiming at minimizing the graph execution time.

Deisa extends Dask, so it becomes possible to couple an MPI-based parallel simulation code with Dask. Deisa enables the simulation code to directly send newly produced data into the worker memories, notify the Dask scheduler that these data are available for analysis, and that associated tasks can then be scheduled for execution.

Compared to previous in situ approaches that are mainly MPI-based, our approach relying on Python tasks makes for a good tradeoff between programming ease and runtime performance.

Problematic

When discussing in-situ data analysis, two primary techniques are often highlighted: in-transit analysis and in-process analysis.

In-transit analysis involves examining data while it is being transferred between systems or across various components of a distributed architecture. For instance, in large-scale simulations or scientific experiments, data is typically generated on one system (such as a supercomputer) and needs to be sent to another system for storage or further analysis.
Rather than waiting for the data to reach its final destination, in-transit analysis allows for computations to be performed on the data as it moves. This approach significantly reduces overall processing time.

In contrast, in-process analysis entails analyzing data during its generation or processing by the application. Instead of waiting for an entire simulation or data generation task to finish, this technique enables concurrent processing of data throughout the ongoing task, such as during simulation steps in a scientific application. By doing so, the burden of post-processing is alleviated, as computational tasks are distributed over time.

To illustrate these techniques, consider the Gysela code. Our goal is to integrate both in-transit and in-process analyses to enhance data analytics while minimizing data transfer between systems. A common diagnostic performed on Gysela data is the global aggregation of certain fields across the entire domain. This global operation can be divided into a subdomain reduction followed by a reduced global reduction. By executing the initial reduction directly on the process where the data is generated, we can significantly decrease the volume of data transferred. This, in turn, alleviates the load on the parallel file system.

However, determining which reductions should be performed on specific resources presents a challenge, especially since we often lack prior knowledge about the types of diagnostics that will be required. This highlights the concept of co-scheduling. In this context, co-scheduling refers to the coordinated execution of in-transit and in-process data analysis tasks to optimize resource efficiency and minimize data movement latency. By aligning the scheduling of these two processes, the system can ensure more effective utilization of resources, such as network bandwidth, CPU, and memory. This approach is particularly vital for large-scale applications, where traditional methods of moving and analyzing massive datasets can lead to significant bottlenecks.

Mission

Before putting in place a solution to automatically manage the separation of local reductions from the workflow, we need to check whether or not the overall performance can be improved by executing the local reductions in-process. The candidate will consider an artificially generated workflow in which one has the possibility to isolate local operations from the global task graph. Next, he/she needs to manually assign those local operations to be performed on the same process as the application. The local results will then be aggregated to dedicated resources for the final results. The candidate will be in charge of the performance evaluation of the whole workflow.

The successful completion of this internship could lead to a 3-year thesis, during which you will further explore the concepts already covered and conduct research work, notably the automation of co-scheduling in-situ and in-process tasks.

Main activities

After studying the state of the art, getting to grips with the architecture of PDI and Deisa, and getting familiar with the Dask environment, the candidate will study, propose, and develop innovative solutions, which he or she will publish in the best journals and conferences in the field. Within the Exa-DoST project of the NumPEx PEPR, the candidate will have privileged access to very large-scale computers for experiments. The framework developed will be tested on large-scale applications with close collaboration with CEA/DAM or/and CEA/DES.
The candidate will be based at Maison de la Simulation, in close collaboration with the teams of specialists in high-performance computing and simulation at Inria Grenoble.

Technical skills

  • An excellent Master degree in computer science or equivalent
  • Strong knowledge of distributed systems
  • Knowledge on storage and (distributed) file systems
  • Ability and motivation to conduct high-quality research, including publishing the results in relevant reviews
  • Strong programming skills (Python, C/C++)
  • Working experience in the areas of HPC and Big Data management is an advantage
  • Very good communication skills in oral and written English
  • Open-mindedness, strong integration skills and team spirit

References

  1. Dask – https://www.dask.org/
  2. Deisa Paper: Dask-enabled in situ analytics. Amal Gueroudji, Julien Bigot, Bruno Raffin. Hipc 2021. https://hal.inria.fr/hal-03509198v1
  3. Deisa Paper: Dask-Extended External Tasks for HPC/ML In Transit Workflows, Amal Gueroudji, Julien Bigot, Bruno Raffin, Robert Ross. Work workshop at Supercomputing 23. https://hal.science/hal-04409157v1
  4. Deisa Code: https://github.com/pdidev/deisa
  5. Ray – https://github.com/ray-project/ray
  6. Damaris: How to Efficiently Leverage Multicore Parallelism to Achieve Scalable, Jitter-free I/O. Matthieu Dorier , Gabriel Antoniu , Franck Cappello, Marc Snir , Leigh Orf. IEEE Cluster 2012. https://inria.hal.science/hal-00715252
Categories
cdd highlight job

[JOB] 2-year HPC Engineer Position at CEA 👩‍💻🧑‍💻- Methods and tools for optimizing Kokkos kernels in large GPU codes

CEA is recruiting a C++/HPC expert for two years to join the CExA “Moonshot” project team to develop new methods and tools to optimize the performance of portable kernels implemented using the Kokkos library. As a use-case, these tools and techniques will be demonstrated in the Dyablo code, a novel HPC code for simulating astrophysical fluids, from stellar interiors to cosmological simulations.

To apply, please send your application (CV and cover letter) to contact@cexa-project.org. If you have any questions about the position, please use the same address. Applications will be assessed from now and until the position is filled.

Context

Europe is investing to build Exaflop supercomputers in the coming years, including the Alice Recoque one in France, at CEA in 2025. These machines will be heterogeneous, and based on GPUs of various brands and architectures. Ensuring performance and portability under these conditions is certainly one of the most significant challenges for Exascale. To address this, CEA is investing heavily in an ambitious “Moonshot” project: CExA. Part of CExA is contributing to the Kokkos C++ GPU programming model to add new features required by European codes and ensure that it is compatible with European supercomputers so researchers can exploit these architectures for their scientific applications. One of these scientific applications is the Dyablo code developed since 2020 at CEA for simulating astrophysical fluids with adaptive mesh refinement. This code was written from the start using Kokkos and is thus already facing optimization challenges that many other codes will face in the coming years.

In this context, CEA opens a two-year engineering position to develop new methods and tools to optimize large applications based on Kokkos. This project will use Dyablo as a use case to test and validate the optimization methods. This project will involve the development team of Dyablo as well as key players of the CExA Moonshot :

  • The DRF’s software and engineering department of the Institute of research into the fundamental laws of the Universe (IRFU) is the main developer of the Dyablo code.
  • Maison de la Simulation (https://www.mdls.fr) of the DRF is a joint research and engineering laboratory of CEA, CNRS, Univ. Paris-Saclay and UVSQ specialized in high-performance computing and numerical simulation.
  • The DES’s software engineering department for simulation brings together three laboratories that address the issues of the simulation environment, AI and data science, high-performance computing, and numerical analysis.
  • The DSSI of the DAM manages activities in computer science, applied mathematics, and information systems, covering a broad spectrum from definition and design to user services.

Mission

As part of both the Dyablo and CExA teams, you will develop tools and methods to optimize the performance of Kokkos’ applications and apply them to Dyablo.

Your mission will include:

  • Development of new methods for the optimization of large applications using Kokkos:
    • Design and develop a tool to extract kernels from a large code while capturing inputs and outputs to generate a self-consistent mini-app that can be easily profiled and optimized separately from the rest of the code.
    • Develop a tool to profile and analyze the performance of the mini-apps extracted by the previous tool. This profiler should be able to provide insights like a profiling software such as Vtune.
    • Design an auto-tuning method to fine-tune any free parameter of the mini-app to gain the optimal performance on a given target architecture.
  • Application of this new set of tools and methods on the Dyablo code:
    • Profile the hotspots of the code and generate a set of self-consistent mini-apps for different types of simulations and architectures.
    • Investigate the optimization potential of the code on different architectures such as Nvidia GPUs, AMD CPUs, and GPUs or Intel CPUs.

    Salary and benefits

    Fiche de poste PTC – SN – MOONK

    • The CEA offers salaries based on your degrees and experience.
    • This position offers several advantages: the possibility to join collaborations with other European laboratories, the United States and Japan,
    • Numerous opportunities to travel internationally (exchanges, conferences, workshops and more)
    • Up to 3 days of telecommuting per week
    • Reimbursement of up to 75% of public transport cards and a free transport network throughout the Ile-de-France region,
    • Interesting complementary health insurance and several company savings plans,
    • 5 weeks of paid vacation and 4 weeks of RTT per year.
    Categories
    cdd highlight job

    [JOB] HPC Engineer Position at CEA 👩‍💻🧑‍💻- Contribution to the development of a genomic code on GPU for the GenEx project

    Join the CEA’s ambitious GenEx project and contribute to the development of a genomic computing library adapted to GPU and exascale systems. We are recruiting an HPC engineer for a period of 1 year (renewable for a second year) to work at our CEA Saclay site near Paris.

    To apply, please send your application (resume and cover letter) to edouard.audit@cea.fr. You can use this same address for any questions concerning the offer. Applications will be evaluated from mid-September until the position is filled.

    Context

    The objective of GenEx is to develop, through a collaboration between the Joliot Institute and the Maison de la Simulation (MdlS), an innovative, versatile, and highly efficient code to interpret genomics experiments, initially focusing on DNA repair mechanisms that are the subject of many experiments at Joliot. This requires very intensive computations, making it essential to have a code capable of fully utilizing the computational power of exascale architectures.

    The collaboration between Joliot and MdS will combine top-level expertise in computer sciences and genomics. Thus, GenEx will implement a tool that will represent a significant advancement over existing tools. Traditional algorithms handling this type of problem are based on Monte-Carlo methods involving conditional events for each agent involved in the process. This approach is particularly unsuitable for GPU-type architectures. Therefore, a new algorithmic approach has been conceived. This approach handles agent interactions in batches, enabling highly efficient GPU code.

    The recruited engineer will be co-supervised by MdS and Joliot to implement this new approach. To achieve a portable, efficient code that fully exploits GPUs, the core of the code will be developed using the Kokkos library, for which there is strong expertise at MdS and within the CExA project. Subsequently, a Python interface will be established to make the code easily usable by the community of biologists/bioinformaticians/biophysicists, knowing that ultimately GenEx could interest many teams in various fields of application.

    The initial applications will focus on yeast, whose genome contains 6k genes each with 300 to 3k base pairs. To interpret experimental results, it is necessary to simulate several million experiments for each gene with a model having about ten free parameters (>100Mh with current methods). One of the long-term objectives will be to conduct a similar study on the human genome (20k genes with 300 to 3M base pairs) on the Exascale machine soon to arrive at CEA. PTC funding is essential to initiate the implementation of the code, and the significant results thus obtained will have a substantial leveraging effect for further development.

    Mission

    As part of the GenEx team you will be responsible to develop and test the new software in collaboration with experts of genomics at Joliot and of computer science at MdS.

    Your mission will include:

    • Discussions and set-up of the physical models to modelise DNA repairs
    • Test of various algorithms to solve these models
    • Development of an highly efficient code, based on Kokkos, to implement the previous

    Skills

    You have a master and/or an engineering degree in computer science and:

    • You have a good knowledge of C++.
    • You have skills in software engineering. You are familiar with common development environments and associated tools.
    • Knowledge of parallel programming (GPU, multi-threaded, etc.) is a plus, especially with the Kokkos library or equivalent.
    • You are autonomous and you wish to be part of an interdisciplinary work team.

    Salary and benefits

    The CEA offers salaries based on your degrees and experience. This position offers several advantages:

    • Numerous opportunities to travel internationally (exchanges, conferences, workshops and more).
    • Up to 3 days of telecommuting per week.
    • Reimbursement of up to 75% of public transport cards and a free transport network throughout the Ile-de-France region.
    • An interesting complementary health insurance and several company savings plans.
    • 9 weeks of paid vacation per year.
    Categories
    cdd highlight job

    [JOB] Engineer Position 👩‍💻🧑‍💻- HPC packaging Expert

    Overview

    We are looking for a candidate with a Master’s degree, Engineer’s degree or PhD in computer science, junior or senior, to join a team responsible for the packaging, deployment, and testing of supercomputing libraries for supercomputers.

    The position is located at Maison de la Simulation team (https://mdls.fr), in Saclay (near Paris), but our team is distributed in the following other locations:

    • Inria Datamove team (https://team.inria.fr/datamove), located near Grenoble, in the French Alps
    • Inria SED team (https://sed-bso.gitlabpages.inria.fr), located near Bordeaux, close to the Atlantic Ocean

    This work is part of the NumPEx project (http://www.numpex.fr) which is endowed with more than 40 million euros over 6 years, starting from 2023. This project aims to build a software stack for Exascale supercomputers related to the arrival in Europe of the first Exascale machine. The French supercomputer is expected for 2025. These machines will be among the most powerful in the world (https://top500.org), used for traditional scientific applications and artificial intelligence workloads. Our role in NumPEx is to design and implement an innovative packaging, deployment and testing strategy. Commonly used solutions show their limits in front of the complexity of supercomputers and applications, as well as the need for reproducibility for open science. Our goal is to build a solution based on a new generation of promising packaging tools: Guix, Nix, Spack, …

    • Contact: Benoît Martin (bmartin@cea.fr) & Bruno Raffin (bruno.raffin@inria.fr)
    • Duration: 3 years
    • Start date: ASAP

    Assignment

    You will contribute to the design and implementation of the packaging and continuous integration strategy. You will participate in the deployment and testing of the infrastructure. Furthermore, you will also participate with user support and training activities around all these aspects. Our packaging strategy is centered on the open source tools Guix (https://hpc.guix.info), Nix (https://nixos.org) and Spack (https://spack.io). In direct contact with the development teams of these tools, with the supercomputer administration teams, and with our foreign counterparts (European, Japanese, American, etc.), you will participate in:

    1. the design of the packaging strategy of the NumPEx project
    2. the effort of packaging these libraries with the proposed tools
    3. the design of a package test and validation solution taking into account the specificities of supercomputers
    4. the development of a solution allowing non-administrator users to deploy NumPEx libraries on supercomputers
    5. training around all of these aspects for researchers and engineers

    Skills

    Master’s degree, Engineer’s degree or PhD, beginner or confirmed (salary adjusted according to experience) in computer science.

    The essential expected skills are:

    • Good practice of Unix/Linux system and system administration
    • Good programming experience (C/C++, Python)
    • Experience with software compilation and installation chains, version management tools, testing and continuous integration (CMake, Git, GitHub, GitLab, …)
    • The work being performed in an international context, a good practice of technical English (written and oral) is expected (proficiency in French is not compulsory), as is a taste for team work.

    Any additional skill related to package managers (Guix, Nix, Spack, apt, rpm, pip, …), containers (Singularity/Apptainer, Docker, …) or open source development are a plus. An initial training time will be provided to complement the missing skills. You will integrate an academic research environment which will give you, throughout your contract, the opportunity to complete your training on cutting-edge technologies.

    Application

    To apply, please send the following elements to Benoît Martin and Bruno Raffin:

    • a curriculum vitae
    • a motivation letter
    • references from people we can contact to certify your qualities
    • a recent internship or thesis report
    • links to software contributions

    Salary and benefits

    The CEA offers salaries based on your degrees and experience. This position provides several advantages:

    • The possibility of joining collaborations with other European laboratories, the United States, and Japan
    • Numerous opportunities to travel internationally (exchanges, conferences, workshops and more)
    • 5 weeks of paid vacation and 4 weeks of RTT per year, and up to 2 days of remote work per week.
    • Reimbursement of up to 75% of public transport cards and a free transport network throughout the Ile-de-France region
    • Complimentary health insurance and several company savings plans
    Categories
    highlight job

    [JOB] PhD. 👩‍💻🧑‍💻 – Multigrid algorithms tuning using AI for the solution of linear systems

    Etablissement : Université Paris-Saclay GS Ingormatique et sciences du numérique
    École doctorale : Sciences et Technologies de l’nformation et de la Communication
    Spécialité : Informatique mathématique
    Unité de recherche : Mdls – Maison de la Simulation

    Encadrement de la thèse : Nahid EMAD PETITON
    Co-Directeur : Thomas DUFAUD

    Description de la problématique de recherche – Project description

    La performance des simulateurs a un impact direct à la fois sur la qualité des résultats de simulation, avec la précision souhaitée et sur la capacité d’explorer une grande variété d’hypothèses scientifiques. Dans un grand nombre de simulateurs numériques, la résolution de systèmes linéaires, très souvent mal conditionnés, constitue l’étape la plus consommatrice en temps de calcul (jusqu’à 80% de la simulation).

    Les méthodes multigrilles font partie des solveurs et des préconditionneurs efficaces et puissants pour la résolution de grands systèmes linéaires mal conditionnés. Cependant, le paramétrage optimal de ces méthodes ; tels que l’algorithme de lissage, les schémas de correction, le choix des opérateurs de restriction dépend du problème à traiter et oriente fortement l’efficacité numérique de cette famille de méthodes. De plus, il existe plusieurs algorithmes multigrilles, comprenant différents algorithmes de lissage tels que Jacobi et Gauss- Seidel, différents schémas de correction tels que V-Cycle, W-Cycle et différents opérateurs de prolongation/restriction. Le choix des différents algorithmes et le réglage fin de ces paramètres améliore le taux de convergence de cette famille de méthode et influence le temps de calcul sur l’architecture cible type CPU, GPU ou TPU. L’expertise de l’utilisateur dans la sélection des paramètres est critique pour une utilisation optimale de ces méthodes.

    En partant de ce constat, l’objectif du travail de recherche proposé est de concevoir et d’utiliser des techniques d’apprentissage afin de trouver pour la méthode multigrille algébrique de meilleurs paramétrages à la fois pour les problèmes linéaires et non linéaires à 2-3 dimensions. Les études ciblées seront issues de deux grandes catégories de simulations : la dynamique des fluides et l’écoulement en milieu poreux. Parmi les applications sélectionnées, nous pouvons évoquer les modélisations Fluide-particules, Géomécanique, et séquestration du Co2, avec pour chacune des systèmes différents à résoudre.

    Simulator performance has a direct impact both on the quality of simulation results, with the desired accuracy, and on the ability to explore a wide variety of scientific hypotheses. In a large number of numerical simulators, the resolution of linear systems, which are very often poorly conditioned, is the most time-consuming stage of the process (up to 80% of the simulation).
    Multigrid methods are among the most efficient and powerful solvers and preconditioners for solving large, ill-conditioned linear systems. However, the optimal parameterization of these methods, such as the smoothing algorithm, the correction schemes and the choice of restriction operators, depends on the problem to be solved and strongly influences the numerical efficiency of this family of methods. In addition, there are several multigrid algorithms, including different smoothing algorithms such as Jacobi and Gauss-Seidel, different correction schemes such as V-Cycle, W-Cycle and different extension/restriction operators. The choice of different algorithms and the fine-tuning of these parameters improves the convergence rate of this family of methods and influences the computation time on the target CPU, GPU or TPU architecture. User expertise in parameter selection is critical for optimal use of these methods.
    With this in mind, the aim of the proposed research work is to design and use learning techniques to find better parameterizations for the algebraic multigrid method, for both linear and non-linear 2-3-dimensional problems. The targeted studies will be drawn from two broad categories of simulations: fluid dynamics and porous media flow. Selected applications include fluid-particle modeling, geomechanics and Co2 sequestration, each with different systems to solve.

    Context

    La simulation numérique est un outil complémentaire aux études expérimentales permettant de comprendre finement les phénomènes physiques complexes. C’est aussi une aide importante pour la mise au point et l’évaluation de solutions techniques innovantes et prospectives. Elle est ainsi au cœur de nombreux domaines tels que la mécanique des fluides, la conception de matériaux ou les géosciences.

    Dans un grand nombre de simulateurs numériques, la résolution de systèmes linéaires, très souvent mal conditionnés, constitue l’étape la plus consommatrice en temps de calcul (jusqu’à 80% de la simulation). Les préconditionneurs employés voient leurs performances variées en fonction des paramètres choisis. L’expertise de l’utilisateur dans la sélection des paramètres est ainsi nécessaire pour une utilisation optimale de ces méthodes.

    Dans ce projet, nous nous intéressons aux systèmes linéaires issus de la discrétisation d’un problème de poisson à coefficients variables. Ce problème est au cœur de nombreuses méthodes numériques pour la simulation d’écoulements de type Darcy ou Navier-Stockes. Les méthodes multigrilles algébriques (AMG) sont parmi les plus efficaces pour résoudre les systèmes linéaires associés à la discrétisation de cette équation sur des maillages non structurés. Ces méthodes emploient des opérateurs de “smoothing” pour réduire les composantes à haute fréquence de l’erreur, des opérateurs de restriction pour projeter le problème sur des grilles grossières et enfin des opérateurs de prolongation pour revenir sur les tailles de grilles fines initiales. Chacune de ces étapes, ainsi que leur ordonnancement en cycles, dispose de nombreux paramètres à sélectionner (type de smoother et nombre d’itérations, choix des opérateurs de restriction et prolongation pour tenir compte de la variabilité des coefficients de l’équation, ou encore agencement des cycles multigrilles) (Briggs et al.). Le paramétrage optimal de ces méthodes dépend fortement du problème physique modélisé et représente une difficulté importante dans leur adoption.

    Il est ainsi nécessaire de développer des méthodes d’apprentissage permettant d’accélérer la convergence des méthodes multigrilles, soit en obtenant un paramétrage optimal de méthodes existantes, soit en imitant directement les méthodes existantes. De nombreux travaux récents se sont engagés dans ces deux voies :
    – Opérateurs de projection : (Katrutsa et al.) et (Greenfeld et al.) proposent des méthodes d’apprentissage non supervisées pour optimiser le paramétrage des opérateurs de prolongation et restriction des méthodes multigrilles géométriques (GMG) à deux niveaux avec une fonction coût permettant de minimiser le rayon spectral de la matrice d’itération. (Luz et al.) étendent ces travaux à des méthodes AMG en utilisant des Graph Neural Networks (GNN) pour gérer les maillages non structurés.

    – Smoother : (Hsieh et al.) proposent une méthode d’apprentissage supervisée basée sur des Convolutional Neural Networks (CNN) et l’architecture U-Net afin d’apprendre une correction à l’algorithme de Jacobi tout en conservant les garanties de convergence. (Huang et al.) étendent ces travaux à l’optimisation du smoother de Jacobi dans les méthodes GMG.
    Ces travaux s’appuient notamment sur le parallèle développé par (He et al.) entre les méthodes multi-grilles et les réseaux de convolution. En effet, le passage d’une grille grossière à une grille fine (et inversement) peut être vu comme une étape de filtrage d’un CNN.

    Les résultats observés montrent une accélération d’un facteur 2 à 10 de la convergence des algorithmes existants tout en conservant d’excellentes propriétés de généralisation. Cependant, ces travaux sont limités à des étapes spécifiques des méthodes multigrilles et ont pour cible principale les méthodes GMG, restreignant leur utilisation aux maillages structurés sur des problèmes simplifiés.
    L’objectif principal de ce travail de thèse est de développer des méthodes d’apprentissage pour la résolution des systèmes linéaires préconditionnés issus de la discrétisation sur maillages non structurés du problème de Poisson à coefficients variables. Ces méthodes doivent permettre d’accélérer la convergence des algorithmes multigrilles sur des architectures matérielles hétérogènes.

    La simulation numérique est un outil complémentaire aux études expérimentales permettant de comprendre finement les phénomènes physiques complexes. C’est aussi une aide importante pour la mise au point et l’évaluation de solutions techniques innovantes et prospectives. Elle est ainsi au cœur de nombreux domaines tels que la mécanique des fluides, la conception de matériaux ou les géosciences.

    Dans un grand nombre de simulateurs numériques, la résolution de systèmes linéaires, très souvent mal conditionnés, constitue l’étape la plus consommatrice en temps de calcul (jusqu’à 80% de la simulation). Les préconditionneurs employés voient leurs performances variées en fonction des paramètres choisis. L’expertise de l’utilisateur dans la sélection des paramètres est ainsi nécessaire pour une utilisation optimale de ces méthodes.

    Dans ce projet, nous nous intéressons aux systèmes linéaires issus de la discrétisation d’un problème de poisson à coefficients variables. Ce problème est au cœur de nombreuses méthodes numériques pour la simulation d’écoulements de type Darcy ou Navier-Stockes. Les méthodes multigrilles algébriques (AMG) sont parmi les plus efficaces pour résoudre les systèmes linéaires associés à la discrétisation de cette équation sur des maillages non structurés. Ces méthodes emploient des opérateurs de “smoothing” pour réduire les composantes à haute fréquence de l’erreur, des opérateurs de restriction pour projeter le problème sur des grilles grossières et enfin des opérateurs de prolongation pour revenir sur les tailles de grilles fines initiales. Chacune de ces étapes, ainsi que leur ordonnancement en cycles, dispose de nombreux paramètres à sélectionner (type de smoother et nombre d’itérations, choix des opérateurs de restriction et prolongation pour tenir compte de la variabilité des coefficients de l’équation, ou encore agencement des cycles multigrilles) (Briggs et al.). Le paramétrage optimal de ces méthodes dépend fortement du problème physique modélisé et représente une difficulté importante dans leur adoption.

    Il est ainsi nécessaire de développer des méthodes d’apprentissage permettant d’accélérer la convergence des méthodes multigrilles, soit en obtenant un paramétrage optimal de méthodes existantes, soit en imitant directement les méthodes existantes. De nombreux travaux récents se sont engagés dans ces deux voies :
    – Opérateurs de projection : (Katrutsa et al.) et (Greenfeld et al.) proposent des méthodes d’apprentissage non supervisées pour optimiser le paramétrage des opérateurs de prolongation et restriction des méthodes multigrilles géométriques (GMG) à deux niveaux avec une fonction coût permettant de minimiser le rayon spectral de la matrice d’itération. (Luz et al.) étendent ces travaux à des méthodes AMG en utilisant des Graph Neural Networks (GNN) pour gérer les maillages non structurés.

    – Smoother : (Hsieh et al.) proposent une méthode d’apprentissage supervisée basée sur des Convolutional Neural Networks (CNN) et l’architecture U-Net afin d’apprendre une correction à l’algorithme de Jacobi tout en conservant les garanties de convergence. (Huang et al.) étendent ces travaux à l’optimisation du smoother de Jacobi dans les méthodes GMG.
    Ces travaux s’appuient notamment sur le parallèle développé par (He et al.) entre les méthodes multi-grilles et les réseaux de convolution. En effet, le passage d’une grille grossière à une grille fine (et inversement) peut être vu comme une étape de filtrage d’un CNN.

    Les résultats observés montrent une accélération d’un facteur 2 à 10 de la convergence des algorithmes existants tout en conservant d’excellentes propriétés de généralisation. Cependant, ces travaux sont limités à des étapes spécifiques des méthodes multigrilles et ont pour cible principale les méthodes GMG, restreignant leur utilisation aux maillages structurés sur des problèmes simplifiés.

    L’objectif principal de ce travail de thèse est de développer des méthodes d’apprentissage pour la résolution des systèmes linéaires préconditionnés issus de la discrétisation sur maillages non structurés du problème de Poisson à coefficients variables. Ces méthodes doivent permettre d’accélérer la convergence des algorithmes multigrilles sur des architectures matérielles hétérogènes.
    Références :

    (Briggs et al.) William Briggs, Van Henson and Steve McCormick. A Multigrid Tutorial, 2nd Edition, SIAM, (2000) ISBN: 9780898714623. (Huang et al.) Ru Huang, Ruipeng Li and Yuanzhe Xi. Learning optimal multigrid smoothers via neural networks, (2021) arXiv:2102.1207v1 (Katrutsa et al.) Alexandr Katrutsa, Talgat Daulbaev and Ivan Oseledets. Deep Multigrid: learning prolongation and restriction matrices, (2017) arXiv:1711.03825v1.

    (He et al.) Juncai He and Jinchao Xu. MgNet: A unified framework of multigrid and convolutional neural network. Sci. China Math. (2019). https://doi.org/10.1007/s11425-019-9547-2
    (Hsieh et al.) Jun-Ting Hsieh, Shengjia Zhao, Stephan Eismann, Lucia Mirabella and Stefano Ermon. Learning Neural PDE Solvers with Convergence Guarantees (2019). arXiv:1906.01200v1

    (Greenfeld et al.) Daniel Greenfeld, Meirav Galun, Ron Kimmel, Irad Yavneh and Ronen Basri. Learning to Optimize Multigrid PDE Solvers. ICML (2019). arXiv:1902.10248
    (Luz et al.) Ilay Luz, Meirav Galun, Haggai Maron, Ronen Basri and Irad Yavneh. Learning Algebraic Multigrid Using Graph Neural Networks (2020). arXiv:2003.05744

    Objectifs

    L’objectif principal de ce travail de thèse est de développer des méthodes d’apprentissage pour la résolution des systèmes linéaires préconditionnés issus de la discrétisation sur maillages non structurés du problème de Poisson à coefficients variables. Ces méthodes doivent permettre d’accélérer la convergence des algorithmes multigrille sur des architectures matérielles hétérogènes.

    Méthode

    Afin de répondre à l’objectif fixé, nous proposons un plan de travail en 4 étapes :
    – Etat de l’art des méthodes multigrilles améliorée par l’IA : Il s’agit dans un premier temps de faire un inventaire des méthodes d’accélération de chaque étape des algorithmes multigrilles. Il faudra évaluer leur impact en analysant les taux de convergence et le rayon spectral des opérateurs de transfert d’erreur résultants. Les travaux porteront d’abord sur des maillages structurés (réseau CNN) puis sur des maillages non structurés (réseau GNN)
    – Conception d’un solveur linéaire multigrille hybride par composant : Sur la base des travaux précédents, il s’agira ensuite de proposer un choix de méthode d’apprentissage pour accélérer chaque étape du solveur multigrille (smoother, opérateurs de projection, niveaux de grille et type de cycle). Ces méthodes seront entraînées de manière indépendante puis assemblées lors de l’évaluation du modèle.
    – Conception d’un solveur linéaire multigrille global : L’optimisation de chaque paramètre n’est pas complètement indépendante des autres composantes: une fois l’approche par bloc établie, on cherchera à enchaîner directement les étapes du solveur multigrille hybride dans un entraînement global.
    – Développement d’un framework général de solveur linéaire hybride : Les méthodes de préconditionnement construites à partir des méthodes de déflation, de décomposition de domaine ou multigrille ont des équivalences (Tang et al., DOI:10.1007/s10915-009-9272-6). L’objectif sera d’adapter les méthodologies développées précédemment pour accélérer ces divers types de méthodes.
    L’efficacité des méthodes développées sera évaluée sur des applications industrielles telles que les modélisations des interactions Fluide-particules, la Géomécanique, ou la séquestration du Co2, avec chacune des systèmes linéaires à résoudre ayant des caractéristiques différentes.
    Nous évaluons la généralisation de ces méthodes sur un ensemble de paramètres de simulation variables : problème physique sous- jacent, géométrie du domaine, conditions aux bord ou encore taille initiale de la grille.

    Résultats attendus – Expected results

    La première cible de la thèse est l’élaboration et la validation de la solution proposée. Ces composants logiciels seront dans un premier temps intégrés dans une bibliothèque écrite dans des frameworks spécifiques (Tensorflow, Pytorch, JAX) permettant de bénéficier directement d’une performance optimisée sur les architectures matérielles de type CPU, GPU voire TPU.
    Cette solution pourra également être interfacée dans les bibliothèques telles que Trilinos et AMGX ce qui nous permettra d’avoir des solutions performantes de référence à la fois sur CPU et GPU. Son efficacité et sa portabilité seront ensuite évaluées dans les simulations de la dynamique des fluides et l’écoulement en milieu poreux.

    Categories
    cdd highlight job

    [JOB] C++ expert engineer 👩‍💻🧑‍💻- Contribution to the development of the Kokkos GPU computing library within the CExA “Moonshot” project

    Join the CEA’s ambitious “Moonshot” project, CExA, and contribute to the development of the Kokkos GPU computing library. We are recruiting six talented and enthusiastic C++ development engineers for a period of 2 years to work at our CEA Saclay site near Paris.

    To apply, please send your application (resume and cover letter) to contact@cexa-project.org. You can use this same address for any questions concerning the offer. Applications will be evaluated from mid-May until the position is filled.

    Context

    Europe is investing to build exaflop supercomputers in the coming years, including one in France, at the CEA, in 2025. These machines will be heterogeneous, accelerated by GPUs of various brands and architectures. Ensuring performance and portability under these conditions is certainly one of the greatest challenges of the Exascale. To address it, CEA is investing heavily in an ambitious “Moonshot” project: CExA. In this project, we will setup libraries to fully exploit this computing power in the scientific applications of the CEA by contributing, extending and adapting the open-source library Kokkos. Within CExA, we represent teams with expertise in numerical computation from the four components of the CEA.

    • Maison de la Simulation of the DRF is a joint research and engineering laboratory of CEA, CNRS, Univ. Paris-Saclay and UVSQ specialized in high performance computing and numerical simulation.
    • The DES’s software engineering department for simulation brings together three laboratories that address the issues of simulation environment, AI and data science, high performance computing and numerical analysis.
    • The DSCIN at DRT/LIST is responsible for the research and development of digital integrated circuits and processors for AI, as well as the design of complex digital architectures. It also works on solutions for embedded systems and develops design tools for embedded AI, embedded systems and trusted circuits.
    • The DSSI of the DAM manages activities in the fields of computer science, applied mathematics and information systems, covering a wide spectrum from definition and design to user services.

    Mission

    As part of a new agile team being set up to carry out the CExA project, you will work in collaboration with the European HPC ecosystem and the teams in charge of the development of Kokkos in the United States (Sandia and Oakridge National labs). You will enrich the library to fit the needs of the CEA applications and to the technologies developed by Europe for the Exascale (EPI, SiPearl, RISC-V).  You will work with cutting-edge and experimental hardware technologies from major vendors (NVIDIA, AMD, INTEL, ARM) that will equipped forthcoming super-computers.

    Your mission will include:

    • Agile development in C++ of the CExA middleware to address the following areas of improvement:
      • Adaptation to “distributed memory” architectures
      • Support for heterogeneous architectures for European exaflop supercomputers
      • Interfacing with external libraries and data processing tools
      • Simplification of deployment
    • Porting via Kokkos and integration of new functionalities in selected application demonstrators (hydrodynamics, fusion energy, AI-assisted medicine)
    • Support and animation on parallel programming models within the laboratory and at the scale of European and global collaborations.

    Skills

    You have a master and/or an engineering degree in computer science and:

    • You have a solid knowledge of advanced C++ and the latest standards.
    • You know how to fit into an agile development process (SCRUM) and you master the basic tools associated with collaborative development (git, github, etc.).
    • You have skills in software engineering. You are familiar with common development environments and associated tools (cmake, docker, spack, gtest, ctest, etc.).
    • Knowledge of parallel programming (GPU, multi-threaded, etc.) is a plus, especially with the Kokkos library or equivalent.
    • You are autonomous and you wish to be part of an international work team. You master technical English (written and oral). You are interested in the world of high-performance computing and its challenges and follow the evolution of technologies.

    Salary and benefits

    The CEA offers salaries based on your degrees and experience.

    This position offers several advantages:

    • the possibility to join collaborations with other European laboratories, the United States and Japan,
    • Numerous opportunities to travel internationally (exchanges, conferences, workshops and more)
    • Up to 3 days of telecommuting per week
    • Reimbursement of up to 75% of public transport cards and a free transport network throughout the Ile-de-France region,
    • An interesting complementary health insurance and several company savings plans,
    • 5 weeks of paid vacation and 4 weeks of RTT per year.
    Categories
    cdd highlight job

    [JOB] Postdoc at CEA Cadarache 👩‍💻🧑‍💻- Autotuning for ultra-high performance computing with partitioned coupling

    Contact

    FAUCHER Vincent CEA DES/DTN/DIR (See pdf)

    Background

    Taking into account multiple and coupled physics is at the heart of many application needs in fields as varied as, but not limited to, aeronautics, defense and biology. This is also strong area of expertise for CEA’s Energy Division, with multiple domains including fluid-structure interaction, neutronics coupled with thermal-hydraulics a/o thermal-mechanics or severe accident modeling. The emergence of exascale architectures opens the way to promising new levels of high-fidelity simulations, but is also significantly increasing the complexity of many software applications in terms of total or partial rewriting. It therefore specifically encourages coupling to limit development work. The idea is to search for each physics of interest in a necessarily reduced number of highly optimized software components, rather than making specific, possibly redundant developments in standalone applications.
    Once the coupled multiphysics problem has been written with the expected levels of accurracy and stability, the proposed work concentrates on the resolution algorithms to enable the coupling between applications asssumed to be themselves exascale-compatible, to be solved efficiently at exascale. It is also worth noting that, in general, the couplings under consideration can present a high level of complexity, involving numerous physics with different level of feedback between them and various communications from border exchanges to overlaping domains. The current post-doctoral internship to be carried out in the framework of the ExaMA collaborative project, is in particular dedicated to the identification and dynamic tuning of the relevant numerical parameters arising from the coupling algorithms and impacting the computational efficiency of the global simulation. Considered problems are in the general case time-evolving problems, with a significant number of time iterations allowing using the first iterations to gather data and conduct the tuning.

    Categories
    cdd highlight job

    [JOB] PostDoc at LIP6 👩‍💻🧑‍💻- Precision auto-tuning and numerical validation of high performance simulations

    Background

    This PostDoc will be carried out in the framework of the PEPR (Programme et Equipement Prioritaire de Recherche) NumPEx project devoted to High Performance Numerics for the Exascale and financed by the France2030 investment program.

    Research directions

    During this PostDoc several directions will be explored to improve algorithms for precision auto-tuning and numerical validation.

    We plan to design a novel autotuning algorithm that will automatically provide arbitrary precision codes, from a required accuracy on the computed results. Because of the number of possible type configurations, particular attention will be paid to the algorithm performance. The type configuration produced will then enable one to improve storage cost, and also execution time taking into account the numerical formats available on the target architectures.

    We plan to combine mixed precision algorithms and precision autotuning tools. Such automatic tools may be useful in the design of mixed precision linear algebra algorithms. Conversely the performance of precision autotuning tools may be improved thanks to mixed precision algorithms. Linear algebra kernels could be automatically identified in simulation codes, and replaced by their mixed precision version, in order to reduce the exploration space for precision tuning.

    The precision auto-tuning algorithms designed during this PostDoc will be validated on large scale programs developed by partners of the NumPEx project. Furthermore new methodologies will be proposed to perform autotuning of both numerical formats and performance parameters in collaboration with experts in coupled physics simulations.

    Location

    Sorbonne Université and its Computer Science lab LIP6 are settled on the Pierre & Marie Curie Campus in the Latin Quarter of Paris, France.

    Salary

    The gross salary per month (including national health insurance and employment insurance) varies from 2682 to 3701 euros depending on the experience.

    Duration

    1 year, renewable 1 year

    Qualifications and skills

    Candidates must have a PhD in Computer Science, Applied Mathematics or other relevant fields, with good programming skills. Developments will be carried out in C++ and Python, so programming expertise in at least one of these languages is required. Good knowledge in numerical algorithms and floating-point computation is also required.

    Categories
    cdd highlight job

    [JOB] HPC DevOps Engineer at CEA 👩‍💻🧑‍💻- Deployment and CI on supercomputers for the C++ Kokkos library within the “Moonshot” CExA project

    CEA is recruiting DevOps engineers for a 2-year period to join the CExA “Moonshot” project team, which is setting up CEA’s GPU computing software stack around the Kokkos C++ library, to contribute to innovative packaging, deployment and continuous integration approaches for supercomputers, based in particular on Spack. A team of more than 10 people is currently being set up. The positions will be based at the CEA Saclay site near Paris.

    To apply, please send your application (CV and covering letter) to contact@cexa-project.org. If you have any questions about the position, please use the same address. Applications will be assessed from mid-November until the position is filled.

    Context

    Europe is preparing for the arrival of the first exascale supercomputers, including one in France, at the CEA, from 2025. These machines will be heterogeneous, accelerated by GPUs of various vendors and architectures. Ensuring performance and portability under these conditions is undoubtedly one of the greatest challenges of Exascale. To address this, the CEA is investing heavily in an ambitious ‘Moonshot’ project: CExA. In this project, we will be providing the libraries needed to fully exploit this computing power in CEA’s scientific applications by contributing to, extending and adapting the Kokkos open-source library. The software stack created in this way will be deployed on the supercomputers using the Spack tool, which has been specially designed for supercomputing environments. Within CExA, we represent teams with expertise in numerical computation from the CEA’s four divisions.

    • Maison de la Simulation (https://www.mdls.fr) of Fundamental Research division the is a joint research and engineering laboratory of the CEA, CNRS, Paris-Saclay University and Versailles Saint Quentin University specializing in high-performance computing and numerical simulation.
    • The software engineering department for simulation of the Energy Research Division groups together three laboratories that address the issues of simulation environments, AI and data science, intensive computing and numerical analysis.
    • LIST’s DSCIN at Technological Research Department is responsible for the research and development of digital integrated circuits and processors for AI, as well as the design of complex digital architectures. It also works on solutions for embedded systems and develops design tools for embedded AI, embedded systems and trusted circuits.
    • The DSSI at Military Application Department manages and carries out activities in the fields of computing, applied mathematics and information systems, covering a broad spectrum from definition and design to user services.

    Mission

    As part of a new agile team being set up to carry out the CExA project, you will be working in collaboration with the French (in particular NumPEx) and European HPC ecosystem and with the teams in charge of developing Kokkos and Spack in the United States to adapt the tools to the needs of the applications developed by the CEA and to the technologies developed by Europe for Exascale (EPI, SiPearl, RISC-V).

    Your mission will include:

    • Supporting agile development in C++ around Kokkos by contributing to the following points:
    • Implementing a testing and performance measurement strategy.
      • Designing, automating and administering continuous integration pipelines.
      • Working with development teams to optimize packaging and deployment processes.
      • Assist with deployment on heterogeneous architectures for European exaflop supercomputers.
    • Identify and participate in the development of missing functionalities within the tools used for packaging and deployment and continuous integration.
    • Helping to deploy Kokkos in the software environments of selected application demonstrators (hydrodynamics, fusion energy, etc.).
    • Provide support and leadership on these themes within the organization and on the scale of European and global collaborations.

    Skills

    You have a Master’s degree and/or an engineering degree in computer science and:

    • You will be able to work within an agile development process (SCRUM) and be familiar with the basic tools associated with collaborative development (Git, GitHub, etc.).
    • You have software engineering skills. You are familiar with common development environments and associated tools (CMake, Docker, Spack, GoogleTest, CTest, etc.).
    • Scripting skills (Python, Shell, etc.)
    • Any knowledge of parallel programming (GPU, multi-threading, etc.) is a plus, particularly with the Kokkos library or equivalent.
    • You have knowledge of the C++ ecosystem.
    • You are a self-starter and are keen to join an international team. You have a good command of technical English (written and spoken). You are interested in the world of high-performance computing and its challenges and keep updated with the latest technological evolution.

    Salary and benefits

    The CEA offers competitive salaries depending on your qualifications and experience.

    There are several advantages to this position:

    • The possibility of joining existing collaborations with other laboratories in Europe, the United States and Japan,
    • Numerous opportunities for international travel (exchanges, conferences, workshops and more),
    • Up to 3 days’ teleworking per week,
    • 75% reimbursement on public transport and a free transport network throughout the Ile-de-France region,
    • An attractive supplementary pension scheme and several company savings plans,
    • 5 weeks’ paid holiday and 4 weeks’ RTT per year.