Research

Research Lines

Dynamic Management of Resources in HPC Environments Dynamic Resource Management allows for dynamic changes of the resources assigned to a job during its execution. This discipline has gained considerable interest over the last years as it could provide many benefits to providers of HPC systems and their users, such as improving energy efficiency and throughput. This research line is in collaborations with Technical University of Munich (TUM), Forschungszentrum Jülich (FZJ), Université Grenoble Alpes (UGA), Universidad de Zaragoza (Unizar), Universtat Jaume I (UJI), and Université Toulouse III - Paul Sabatier (TLSE3).

In-Network Computation and Communication Offloading Extending the standard de facto parallel programming model OpenMP with DPU offloading capabilities (computation and communication) fully integrated in the paradigm and using the original OpenMP syntax. This research line has garnered interest from DPU early adopters such as Georgia Tech, Texas Tech, and Sandia National Laboratory.

Current Projects

Compute in Network: Offloading Computation to the Network (Nvidia) , PI: A. J. Peña, BSC. 2021/10/01-2026/09/30. Work Package 2 Leader: Development of ODOS, a framework for enabling communication and computation offloading to DPUs.

Barcelona Zettascale Lab (C15.I05.P01.SI03) (Spanish Ministry of Science) , PI: M. Valero, BSC. 2022/12/01-2025/12/31. Work Package 2 Researcher: Sergio participates in the design of the resource management software stack of a RISC-V-based cluster composed of chips designed by BSC.

Former Projects

EUPilot (EuroHPC-JU) , PI: C. Puchol, BSC. 2021/12/01-2025/05/31. Work Package 7 Researcher: Sergio deployed and evaluated DMR in a RISC-V-Vext-based cluster composed of European technologies. doi.org/10.3030/101034126

APPWIND (Spanish Research Agency) , PI: S. Chiva, UJI. 2021/11/01-2024/12/31. Work Package 2 Leader: Sergio designed a workflow that combined CFD simulations and AI predictions. Sergio was also responsible for constructing and curating the dataset to train the predictive model.

Collaboration Agreement (Avanqua-Oceanogràfic S.L.) , PI: R. Martínez-Cuenca, UJI. 2021/12/17-2024/12/17. Researcher: AI-assisted CFD simulations of the facilities to improve air quality within fish tanks in an aquarium.

Collaboration Agreement (FACSA) , PI: S. Chiva, UJI. 2021/02/09-2024/11/22. Researcher: FACSA-UJI facilitates the adoption of artificial intelligence techniques in the management of the integral water cycle.

DEEP-SEA (EuroHPC-JU) , PI: P. Radojkovic, BSC. 2021/04/01-2024/03/31. Work Package 3 Researcher: Sergio participates in the decision-making process of the mechanisms to tackle dynamic resources. doi.org/10.3030/955606

APOSTD/2020 (European Social Funds and Valencian Region Government) , PI: S. Iserte, UJI. 2020/09/01-2022/12/10. Principal Investigator: Sergio initiated the AI-CFD research line in the group. He was the precursor of devising a new set of data-driven techniques to accelerate long transient simulations.

FP7 318793 - EXA2GREEN: Energy-Aware Sustainable Computing on Future Technology - Paving the Road to Exascale Computing (European Commission) , PI: E. S. Quintana-Ortí, UJI. 2015/10/01-2018/09/30. Researcher: Sergio designed, developed, and evaluated the MPI malleability framework DMR for HPC clusters.

APOTIP/2016/A/033 (Spanish Ministry of Science) , PI: E. S. Quintana-Ortí, UJI. 2016/09/01-2017/10/31. Researcher: Sergio implemented a solution that automated the setup and deployment of Xeon Phi coprocessors in heterogeneous clusters.

H2020-FETHPC-2014 671602 - INTERTWinE. Programming Model Interoperability Towards Exascale (European Commission) , PI: E. S. Quintana-Ortí, UJI. 2012/11/01-2015/10/31. Researcher: Sergio designed and developed an extension for Slurm to support remote GPU virtualization with the rCUDA technology.

Collaboration Agreement (Mellanox Inc.) , PI: E. S. Quintana-Ortí, UJI. 2012/01/01-2013/12/31. Researcher: rCUDA technology development.

INNPACTO IPT-2011-1232-43 (Spanish Ministry of Science) , PI: C. Cebrián, Tissat. 2011/07/01-2013/12/31. Researcher: Sergio designed and developed a migration mechanism for virtual machines in OpenStack cloud computing infrastructure.

REALCLOUD - Real Data Center Cloud Services and Environment () , PI: R. Mayo, UJI. 2012/11/01-2012/12/31. Researcher: Developing a middleware what was able to consolidate the system, making decisions depending on the TI data gathered in real-time.

MONICA - Monitoring and control system with intelligent energy efficiency management for ICT resources in ultradense data centers oriented HPC and Cloud Computing () , PI: R. Mayo, UJI. 2012/04/01-2012/10/31. Researcher: Theoretical study of the power consumption in the cluster of the FCSCL (SuperComputing Foundation of Castilla y León).

PhD Thesis - High-Throughput Computation through Efficient Resource Management Scientific applications run on supercomputers where thousands of nodes are shared among users. When those applications start, their resources remain allocated until the job ends. We have detected two potential approaches in resource managing, with which we increase the global throughput and provide a better utilization of the underlying resources. The Dynamic Management of Resource (DMR) framework is conceived to facilitate the programmability of malleable applications automating resource reallocation, process handling, and data distribution. DMR is based on the Message Passing Interface (MPI) programming model, the standard de facto for developing HPC distributed applications. DMR adjusts the process number of the jobs depending on the cluster status in terms of resource availability and quantity of pending jobs. Performance analyses have reported a makespan reduction of 4x, when combined with moldability, compared to traditional rigid workloads. DMR has also been used in GPU-capable workloads improving their energy efficiency up to 2.5x. The relevance of the DMR malleability solution is such that it has been incorporated for the European projects: “The European Pilot” EuroHPC-JU, DEEP-SEA, and TimeX.

MSc Thesis - A Remote GPU Manager for HPC Clusters rCUDA is a virtualization solution which allows to share GPUs among the nodes in a cluster. SLURM is a workload manager able to schedule jobs and manage resources. In this project I have been in charge of the integration of both technologies, since RCUDA have not got the feature of managing workloads and SLURM does not know how to share resources such as GPUs. Nowadays, the RCUDA project offers this integration by applying a patch to SLURM.

Degree's Thesis - Adjusting the Energy Consumption in Computer Facilities This project consisted in the development of a simulator to assess energy saving strategies and policies in HPC workloads. The real system Energy Saving Cluster (ESC) based on Sun Grid Engine (SGE) was modeled in order to simulate its behavior, taking into account: the different features of the components in the cluster, the scheduling and the energy saving policies and generating statistics and charts with the results. The simulator was written in Python and had a user web interface for its management.