Parallel Programming in Python
10 Oct 2022
11 Oct 2022
This workshop will be delivered in person, unless new COVID-19 restrictions are put in place. The workshop will take place at Science Park 402, 1098 XH Amsterdam. Lunch and drinks at the end of the workshop are included.
Python is one of most widely used languages to do scientific data analysis, visualization, and even modelling and simulation. The popularity of Python is mainly due to the two pillars of a friendly syntax together with the availability of many high-quality libraries. The flexibility that Python offers comes with a few downsides though: code typically doesn’t perform as fast as lower-level implementations in C/C++ or Fortran, and it is not trivial to parallelize Python code to work efficiently on many-core architectures. This workshop addresses both these issues, with an emphasis on being able to run Python code efficiently (in parallel) on multiple cores.
We’ll start with learning to recognize problems that are suitable for parallel processing, looking at dependency diagrams and kitchen recipes. From then on, the workshop is highly interactive, diving straight into the first parallel programs. This workshop teaches the principles of parallel programming in Python using Dask, Numba and Snakemake. More importantly, we try to give insight in how these different methods perform and when they should be used.
The workshop is based on the teaching style of the Carpentries, and learners will follow along while the instructors write the code on screen. More information can be found on the workshop website.
Who: The workshop is open and free to all researchers in the Netherlands at PhD candidate level and higher. We do not accept registrations by Master students. The workshop is aimed at PhD candidates and other researchers or research software engineers.
The participant should be:
- familiar with basic Python: control flow, functions, NumPy
- comfortable working in Jupyter
- understand how NumPy and/or Pandas work
- A programming editor, when in doubt we recommend Microsoft VS Code.
- Python version 3.9, we recommend Anaconda or Miniconda if you only use the command-line interface.
- Git. If you’re on Windows, follow these instructions: Git for Windows.
- Recognizing potential for parallelism
- Dependency diagrams
- Measuring performance
- Working with Dask arrays
- Working with Numba
- Parallel design patterns
- Delayed evaluation
- Dependency based programming using Snakemake