Graphics processing units (GPUs) have emerged as a powerful platform because they offer high performance and energy eﬃciency at relatively low cost. They have been successfully used to accelerate many scientific workloads. Today, many of the top500 supercomputers are equipped with GPUs and are the driving force behind the recent surge in machine learning.
However, developing GPU applications can be challenging, in particular with regard to software engineering best practices, and the quantitative and qualitative evaluation of output results. While some of these challenges, such as managing different programming languages within a project, or having to deal with different memory spaces, are common to all software projects involving GPUs, others are more typical of scientiﬁc software projects.
In their paper, Lessons learned in a decade of research software engineering GPU applications, the research software engineers (RSEs) address the challenges that they have encountered and the lessons learned from using GPUs to accelerate research software in a wide range of scientific applications.
“Many of the GPU applications used as case studies in the paper were developed as part of eScience projects at the Netherlands eScience Center”, says Ben van Werkhoven, research software engineer at the Netherlands eScience Center. “Programming GPU applications is a specialized field and while many scientists develop their own code, GPU research software is often developed by research software engineers (RSEs) that have specialized in this field. The goal of the paper is really to share our experiences, hoping that others can learn from our mistakes as well as our insights.”
The researchers of the paper recommend to carefully select and if needed rewrite the original application to ensure the starting point is of sufficient code quality and is capable of solving the problem at the scale the GPU application is targeting. When performance comparisons of different applications are of interest to the broader scientiﬁc community it is important that RSEs can publish those results, both for the community to take notice of this result and for the RSEs to advance in their academic career.
According to van Werkhoven “The reason to move code to the GPU is often to target larger, more complex problems, which may require the development of new methods to operate at higher resolutions or unprecedented problem scales. GPU code can be implemented in many different ways, resulting in large design spaces with, for example, different ways to map computations to the GPU threads. As such, auto-tuning, with tools such as Kernel Tuner, is often necessary to achieve optimal and portable performance.
Evaluating the results of GPU applications often requires carefully constructed test cases and expert knowledge from the scientists who developed the original code. In eScience projects, these are often the project partners with whom we are collaborating.”
The software sustainability of GPU research software remains an open challenge as GPU programming remains a specialized field and RSEs are often only involved during short-lived collaborative projects.
According to Ben, they will continue to use GPUs in scientific projects and expect to continue to do so for a long time. Recently, several new supercomputers were announced and all the big machines include GPUs because of their high performance and energy efficiency. In addition, they plan to further advance and apply GPU auto-tuning technology.
Watch the short video presentation about the paper “Lessons learned in a decade of research software engineering GPU applications”, which is part of the SE4Science 2020 workshop.
Join the discussion until June 12, 2020
If you would like to participate in the discussion after watching the video, add your question here.
The authors will respond to your comment as soon as possible.
Ben van Werkhoven, Willem Jan Palenstijn, Alessio Sclocco, Lessons learned in a decade of research software engineering GPU applications, International Workshop on Software Engineering for Computational Science (SE4Science 2020), ICCS 2020, Part VII, LNCS 12143. (preprint: arXiv:2005.13227).