Python for High-Performance Computing

Walkthrough for module authors

Description

This training introduces participants to Python for high-performance computing, covering parallel programming, performance optimization, and HPC resource utilization. Designed for researchers and developers, the course includes hands-on sessions to enhance practical skills.

Prerequisites

  • Basic experience with Python

  • Basic experience in working in a Linux-like terminal

  • Some prior experience in working with large or small datasets

Course Topics

  • Python’s role in HPC and performance optimization

  • Parallel programming techniques for efficient computing

  • How to utilize HPC resources effectively

  • Hands-on experience with lab exercises for practical skills

Target Audience

Level: 70% beginner, 30% intermediate

Prerequisites: Basic experience with programming in Python

Language: English

Technical Requirements

  • Python and its dependencies

  • Jupyter Notebook for interactive coding

  • Anaconda (optional) for managing dependencies

Instructors

Dominika Regeciova is a Lecturer at IT4Innovations, focusing on the intersection of AI and security. Previously, she worked as a Senior Researcher at Avast. She holds a master’s degree in Information Security from the Faculty of Information Technology at Brno University of Technology (FIT BUT). Her work is driven by a passion for bringing formal theory and advanced AI methods into practical, real-world applications with an emphasis on security and reliability.

Tomas Martinovic is a senior researcher at the Advanced Data Analysis and Simulation Laboratory within the IT4Innovations National Supercomputing Center. His work primarily focuses on data science, data visualization, and mathematical modeling leveraging statistical methods and deep neural networks.

Ghaith Chaabane is a researcher at the Advanced Data Analysis and Simulation Laboratory within the IT4Innovations National Supercomputing Center.

About the Infrastructure

Participants have access to the Karolina supercomputer for hands-on sessions, utilizing both CPU and GPU resources. Karolina, operational since 2021, is the most powerful supercomputer in the Czech Republic and ranks among Europe’s top systems. It features a standard part with 720 nodes, delivering 11.6 PFlop/s for traditional HPC simulations, and an accelerated section comprising 72 servers, each equipped with 8 GPU accelerators, achieving up to 360 PFlop/s for AI computations.

This infrastructure supports complex scientific and industrial challenges, including numerical simulations, data analysis, and artificial intelligence applications.

Learning outcomes

This material is for all researchers and engineers who work with large or small datasets and who want to learn powerful tools and best practices for writing more performant, parallelised, robust and reproducible data analysis pipelines.

By the end of this module, learners should:

  • Have a good overview of available tools and libraries for improving performance in Python (link to leaves in skill tree)

  • Knowing libraries for efficiently storing, reading and writing large data (link to leaves in skill tree)

  • Be comfortable working with NumPy arrays and Pandas dataframes for data analysis using Python (link to leaves in skill tree)

Credit

Don’t forget to check out additional course materials from IT4Innovations National Supercomputing Center. Please contact us if you want to reuse these course materials in your teaching. You can also join the XXX channel to share your experience and get more help from the community.

License

Note

To module authors: For code you may use any OSI-approved license as mentioned in https://spdx.org/licenses/, such as Apache License 2.0, GNU GPLv3, MIT. Please make sure to update the deed above and LICENSE.code file accordingly.