Graphics processing units (GPUs) have emerged as a powerful platform because they offer high performance and energy eﬃciency at relatively low cost. They have been successfully used to accelerate many scientific workloads. Today, many of the top500 supercomputers are equipped with GPUs and are the driving force behind the recent surge in machine learning.
However, developing GPU applications can be challenging, in particular with regard to software engineering best practices, and the quantitative and qualitative evaluation of output results. While some of these challenges, such as managing different programming languages within a project, or having to deal with different memory spaces, are common to all software projects involving GPUs, others are more typical of scientiﬁc software projects.
In their paper, Lessons learned in a decade of research software engineering GPU applications, the research software engineers (RSEs) address the challenges that they have encountered and the lessons learned from using GPUs to accelerate research software in a wide range of scientific applications.
“Many of the GPU applications used as case studies in the paper were developed as part of eScience projects at the Netherlands eScience Center”, says Ben van Werkhoven, research software engineer at the Netherlands eScience Center. “Programming GPU applications is a specialized field and while many scientists develop their own code, GPU research software is often developed by research software engineers (RSEs) that have specialized in this field. The goal of the paper is really to share our experiences, hoping that others can learn from our mistakes as well as our insights.”
The researchers of the paper recommend to carefully select and if needed rewrite the original application to ensure the starting point is of sufficient code quality and is capable of solving the problem at the scale the GPU application is targeting. When performance comparisons of different applications are of interest to the broader scientiﬁc community it is important that RSEs can publish those results, both for the community to take notice of this result and for the RSEs to advance in their academic career.
According to van Werkhoven “The reason to move code to the GPU is often to target larger, more complex problems, which may require the development of new methods to operate at higher resolutions or unprecedented problem scales. GPU code can be implemented in many different ways, resulting in large design spaces with, for example, different ways to map computations to the GPU threads. As such, auto-tuning, with tools such as Kernel Tuner, is often necessary to achieve optimal and portable performance.
Evaluating the results of GPU applications often requires carefully constructed test cases and expert knowledge from the scientists who developed the original code. In eScience projects, these are often the project partners with whom we are collaborating.”
The software sustainability of GPU research software remains an open challenge as GPU programming remains a specialized field and RSEs are often only involved during short-lived collaborative projects.
According to Ben, they will continue to use GPUs in scientific projects and expect to continue to do so for a long time. Recently, several new supercomputers were announced and all the big machines include GPUs because of their high performance and energy efficiency. In addition, they plan to further advance and apply GPU auto-tuning technology.
Watch the short video presentation about the paper “Lessons learned in a decade of research software engineering GPU applications”, which is part of the SE4Science 2020 workshop.
Join the discussion until June 12, 2020
If you would like to participate in the discussion after watching the video, add your question here.
The authors will respond to your comment as soon as possible.
Ben van Werkhoven, Willem Jan Palenstijn, Alessio Sclocco, Lessons learned in a decade of research software engineering GPU applications, International Workshop on Software Engineering for Computational Science (SE4Science 2020), ICCS 2020, Part VII, LNCS 12143. (preprint: arXiv:2005.13227).
Per 1 januari 2020 trad cultuurhistoricus Joris van Eijnatten aan als nieuwe directeur van het Netherlands eScience Center. Komend najaar presenteert hij een nieuw strategisch plan waarin de prioriteiten en werkwijze van het eScience Center in relatie tot de ontwikkeling en toepassing van research software aan bod komt.
Steven Claeyssens interviewde van Eijnatten voor e-data&research over zijn visie aangaande kennisoverdacht, DCC’s, de FAIR-principes en de verduurzaming van software. Het hele interview kun je hier lezen: https://www.edata.nl/1403/pdf/1403_5.pdf
e-data&research, Jaargang 14 nummer 3 / juni 2020
Interview door Steven Claeyssens
The Netherlands eScience Center has changed the format of the scheduled information event for the eTEC 2020 and ASDI 2020 calls. Instead of a live online video streaming event, all the relevant information for both calls has now been made available on the eScience Center website.
This decision was made in order to give all participants equal access to all further information associated with the eTEC and ASDI calls. ‘Despite their immense value, video calls are often accompanied by technical issues that prevent some participants from receiving all the relevant information. We want to avoid such a situation and ensure an equal playing field for all potential applicants’, says Dr. Frank Seinstra, the eScience Center’s program director.
The information slide decks and Q&As can be found here: esciencecenter.nl/asdi-etec2020
Participants who have a question that is not listed on the Q&A page are advised to contact Tom van Rens (NWO, 070 344 0509) or Dr Frank Seintra (eScience Center, 020 460 4770) or send an email to firstname.lastname@example.org or email@example.com
Please note: the Q&A document will be updated regularly so please make sure to visit the page often.
The Lorentz Center and the Netherlands eScience Center invite researchers to join the Lorentz-eScience competition.
Every year the eScience Center and the Lorentz Center invite researchers to join the Lorentz-eScience competition. This competition aims to host a leading-edge workshop on digitally enhanced research (efficient utilization of data, software and e-infrastructure). The workshop should bring together researchers from the academic community and the public/private sector. The winner will organize a workshop at the Lorentz Center@Snellius in Leiden, the Netherlands.
What we seek
• an innovative scientific program, that takes us beyond current boundaries
• an open and interactive format, with few lectures
• at least one scientific organizer based within and one outside the Netherlands
• at least one scientific organizer from academia and one from the public/private sector
What we offer
• a 5-day workshop for up to 25 people in the first half of 2021
• travel and accommodation reimbursements
• no registration fees or other organizational costs
• a professional support organization, under the philosophy ‘you do the research, we do the rest’
• a 1-page expression of interest by 15 April
• a full application by 6 June
• final decision end of June
• submit applications to: firstname.lastname@example.org
Please find answers to several FAQs here
This year’s report provides a comprehensive overview of the organization’s activities in 2019. The report opens with a message from the acting Director Rob van Nieuwpoort and an overview of all projects and collaborations. This is followed by our main activities and events over the past year.
Advancing research through machine learning: an applied coding workshop
From 20 -24 January, the Netherlands eScience Center held a workshop on Machine Learning for Research at its offices at Amsterdam Science Park.
During the workshop, which took place in a collaborative workspace, six teams from different disciplines and research institutions spent a week of hands-on work with machine learning experts from the eScience Center. Each team came equipped with their own data and went on to an intensive one-week collaboration with machine-learning experts from the eScience Center and SURF to explore the best machine learning strategy to tackle their research question.
The core focus of the workshop was on writing and developing code to analyze the data and apply suitable machine-learning techniques. This hands-on machine-learning experience was complemented by inspiring talks by the Director of the Netherlands eScience Center, Joris van Eijnatten, Maxwell Cai (SURF) on machine and deep learning, Vincent Warmerdam (GoDataDriven) on artificial stupidity, Jakub Tomczak (VU) on deep generative modeling and Florian Huber (eScience Center) on machine learning in research – dealing with the non-ideal.
“The workshop was a great experience. Together with my team, I got to actually focus on the research for 5 days without any interruption. The experts gave us enough time to work with our own data, providing us with good set of starting models we can refine. The talks were inspiring and informative. The trainers didn’t just explore the possibilities of machine Learning but also discussed its pitfalls.” – Eduard Klapwijk
“The best thing is we get to use our own data and work on our own problem instead of a hypothetical problem. It feels like we are actually getting somewhere instead of leaving the workshop with a very abstract view on machine learning tackling research problems” – Niala den Braber
“I was amazed by how motivated, persistent, and curious all teams were in exploring machine-learning options on all the great datasets they brought. Many participants started from a fairly basic understanding of machine-learning, but I really felt that their good knowledge about research and data allowed them to super quickly get a very good intuition about what machine-learning can and cannot do. I hadn’t expected that we would come that far in only one week.” – Florian Huber
How has conservative rhetoric evolved over the past two centuries? And what kind of language have writers consciously used to express a moral opinion that might be qualified as conservative? In a new research paper, historian Joris van Eijnatten employs textual analysis tools to explore the nature of conservative rhetoric in the London-based Times newspaper between 1785 and 2010. Among other things, his findings throw light on the confluence of right-wing and left-wing rhetoric over time. The paper was recently published in Digital Scholarship, Digital Classrooms – New International Perspectives in Research and Teaching.
For his research, Van Eijnatten, director of the Netherlands eScience Center, employs two text mining techniques: n-grams (especially bigrams) and word embeddings. He traces a number of bigram phrases during the period in question, the most important of which are “conservative principles”, “conservative values”, “traditional values” and “permissive society”.
On principles and values
Van Eijnatten starts by exploring the term ‘conservative principles’, arguably the first instance of conservative rhetoric. His analysis reveals the popularity of this term and how its usage, both in parliament and in the Times, peaked in the 1840s before declining from the 1850s onwards. While the term was mostly political in nature, its usage can be classified into three distinct phases. During the first phase, pre-1830, it reflected the counterrevolutionary sentiment of early conservative thought. In phase two, between 1830 to 1950, it continued to be used in opposition to reform and the reform movement. The third and final phase (1950-2010) was characterised by a decline in its use.
Using bigram embeddings, the author then demonstrates several phrases that came to have the same meaning as conservative principles, the most popular being ‘conservative values’. “The use of the term values grew towards the end of the 19th century, when principles were on the wane”, Eijnatten explains. “In its original sense, it referred to the economy, finances and trade. Its ethical connotation and political connotation originally emerged in the US before being widely adopted in British English.”
The moralising turn
But could the linguistic popularity of the term ‘conservative values’ have arisen in another context than principles? To answer this, Van Eijnatten selected nine words that continued to have stabled meanings in the periods 1901-1905, 1951-1955, and 2001-2005. For each of these words, he generated the top fifty most similar words, and for each of these words another top fifty most similar words. This resulted in 22,500 words in total (9 x 50 x 50), with the number of unique words ranging from 5,000 to 7,500.
By forcing the network to cluster automatically into three groups, a linguistic pattern emerges, one that seems to indicate what Van Eijnatten refers to as the ‘historical moralisation’ of politics. “Between 1955 and 2001, the semantic relations between the political and the civilisational became stronger, as illustrated by the transition from ‘conservative principles’ to ‘conservative values’. This led me to examine value-laden phrases in which the political qualifier ‘conservative’ was omitted, in particular the phrase ‘traditional values’.”
The phrase ‘traditional values’ can be traced to the early part of the 20th century, a time of rising tension between tradition and modernity, says Van Eijnatten. Initially, the debate largely centred on socio-cultural change and allowed for various opinions and positions. This tension evolved into conflict in the early 1960s, when tradition was used in an explicitly oppositional sense to what become known as the permissive society. “The term eventually took on clear political significance and, as its use in The Times shows, became part of the rhetoric on the right in the UK.”
Conservativism as moral language
By tracking several strands of conservative rhetoric, from conservative principles through conservative values to traditional values, Van Eijnatten’s findings demonstrate how these phrases followed specific historical trajectories and, just as interestingly, how conservative rhetoric became assimilated into popular discourse in the UK in the 1980s.
Van Eijnatten: “We tend to categorise moral languages as ideologies, as enlightened, liberal, Christian, nationalist or conservative, but these simple labels often do more justice to ordering the present than to understanding the past and vice versa. Digital history techniques help us to identify the changing clusters of words that come together in moral languages. As my research shows, these techniques open up new avenues for research and paint us a different, more complex, ambiguous, and varied past, and help us conceive of different futures in a present that seems to have lost its bearings.”
J. van Eijnatten, ‘On Principles and Values: Mining for Conservative Rhetoric in the London Times, 1785-2010’ in Digital Scholarship, Digital Classrooms – New International Perspectives in Research and Teaching pp. 1-26 (Farmington Hills, MI: Gale, 2020).
Read the full paper
Structural variants (SVs) are an important class of genetic variation implicated in a wide array of genetic diseases. Despite advances in whole genome sequencing, however, comprehensive and accurate detection of SVs in short-read data still poses some practical and computational challenges. To address this problem, a team of scientists and research software engineers from the Netherlands eScience Center has developed a new portable open-source workflow called sv-callers that enables improved detection of SVs in cancer genomes using multiple tools. Their work was recently published in PeerJ – the Journal of Life and Environmental Sciences.
Structural variants such as deletions, insertions and duplications, account for a large part of the genomic diversity among individuals and have been implicated in many diseases, including cancer. With the advent of novel DNA sequencing technologies, whole genome sequencing (WGS) is becoming an integral part of cancer diagnostics and can potentially enable tailored treatments of individual patients. However, despite advances in large-scale cancer genomics projects, the detection of SVs in genomes remains challenging due to computational and algorithmic limitations.
The ensemble approach
“Recent tools for somatic and germline SV detection exploit more than just one type of information present in WGS data”, says Dr Arnold Kuzniar, eScience Engineer and first author. “A promising way to obtain more accurate and comprehensive results is by using what is known as the ensemble approach, which has been shown to improve the detection of SVs. Nevertheless, running multiple SV tools efficiently on a user’s computational infrastructure or adding new SV callers as they become available has been difficult.”
According to Kuzniar, a common practice is to couple multiple tools, or “callers”, together with monolithic wrapper scripts and, to a lesser extent, by a workflow system. Such a workflow is recommended as a way to improve the extensibility, portability and reproducibility of data-intensive analyses, but is usually developed to run on one computer system and therefore not necessarily portable to or reusable on another system.
SV callers tied together
To address these problems, the team developed “sv-callers”, a user-friendly, portable and scalable workflow based on the Snakemake and Xenon (middleware) software. The workflow includes state-of-the-art somatic and germline SV callers, which can easily be extended, and runs on high performance computing clusters or clouds with minimal effort. It supports all the major SV types as detected by the individual callers.
“The workflow was developed incrementally based on requirements in the context of the Googling the cancer genome project, which is led by Dr Jeroen de Ridder from the University Medical Center Utrecht and supported by the eScience Center”, says Kuzniar. “We have extensively tested the workflow with [human] WGS datasets on different HPC systems as well as performed a number of production runs on the genomes of cancer patients. The workflow readily automated parallel execution of the tools across compute nodes and enabled streamlined data analyses in a Jupyter Notebook.”
Kuzniar credits the workflow’s results to the wide-ranging expertise of the individual project partners. “Developing this workflow was truly a collective endeavor. Without the in-depth knowledge of and experience with short read sequencing data and SV detection in particular, the workflow would have been computationally efficient but the results incomplete or inaccurate from a biological point of view. I am really happy to be part of what we’ve achieved together.”
The team has already made the workflow freely available and moving forward intends to maintain the software.
Kuzniar A, Maassen J, Verhoeven S, Santuari L, Shneider C, Kloosterman WP, de Ridder J. “SV-Callers: A Highly Portable Parallel Workflow for Structural Variant Detection on Whole-Genome Sequence Data” in PeerJ (6 January 2020). https://doi.org/10.7717/peerj.8214
The Netherlands eScience Center and Atos-Bull have granted four proposals within the EU-funded project ‘Center of Excellence in Simulation of Weather and Climate in Europe’ (ESiWACE2). The selected projects will receive consultancy, advice and engineering from the research software engineers at the eScience Center and Atos-Bull. These collaborative projects will allow experts in high-performance computing (HPC) and accelerated computing to work together with model developers to advance the software in order that (parts of) the model can be executed efficiently on modern CPU processor or modern computing accelerators such as Graphics Processing Units (GPUs).
The ESiWACE and ESiWACE2 projects aim to improve model efficiency and prepare the software to enable model execution on existing and near-future hardware architectures and simulate experiments at unprecedented grid resolutions or ensemble sizes. In addition, it will include computationally expensive physical processes that were previously unfeasible.
Alfred Wegener Institute for Polar and Marine Research, Bremerhaven, Germany – Natalja Rakowsky
Finite-volumE Sea ice-Ocean Model, Version 2.0 (FESOM2)
FESOM2 is a global sea-ice ocean circulation model based on unstructured meshes. It allows one to simulate the global ice-ocean system at extremely high resolution in the regions of interest at an affordable computational cost. The broad spectra of FESOM2 applications include several climate models and standalone sea ice-ocean configurations. The earth system components that have been successfully coupled to FESOM2 include: ECHAM6.3, OpenIFS, REMO, PISM and ReCOM.
Two aspects will be addressed:
1) Profile FESOM2 with GPUs in mind and port the best suited numerical kernels to GPUs.
2) Get a fresh view on FESOM2 optimization in general.
Cyprus Institute – Theo Christoudias
EMAC (ECHAM-MESSy Atmosphere Climate) model
The ECHAM/MESSy Atmospheric Chemistry (EMAC) model is a numerical chemistry and climate simulation system that includes sub-models describing tropospheric and middle atmosphere processes and their interaction with oceans, land and human influences.
Within the “Earth System Chemistry Integrated Modelling (ESCIMo)” initiative, chemistry-climate-simulations are conducted by the MESSy Consortium with the ECHAM/MESSy Atmospheric Chemistry (EMAC) model for special topics related to the national project of the DFG-Forschergruppe SHARP (Stratospheric Change and its Role for Climate Prediction) and the international IGAC/SPARC Chemistry-Climate Model Initiative (CCMI). These simulations will be carried out in support of upcoming WMO/UNEP ozone and IPCC climate assessments and will help to answer emerging scientific questions as well as improve process understanding. Acceleration of the chemistry mechanism can reduce required CPU-nodes and time-to-solution by a factor 5, with an order of magnitude more complex atmospheric chemical mechanism (in terms of number of species and reactions) than current state-of-the-art.
Delft University of Technology, Centrum Wiskunde & Informatica – Fredrik Jansson, Pier Siebesma
DALES – the Dutch Atmospheric Large Eddy Simulation
DALES is a large-eddy simulation code designed for studies of the physics of the atmospheric boundary layer, including convective and stable boundary layers as well as cloudy boundary layers. DALES can also be used for studying more specific cases, such as flow over sloping or heterogeneous terrain, and dispersion of inert and chemically active species.
The main goals of this new collaboration are 1) improving the scaling of DALES to many nodes and 2) improved single-threaded performance through more cache-friendly data-access patterns, potentially switching from double to single precision calculations, and improved numerical algorithms.
The aim is to merge the optimizations into the official DALES version, so as to be easily accessible for all users.
Royal Netherlands Meteorological Institute (KNMI) – Thomas Reerink
Ice caps are part of the climate system and interact with the atmosphere and the ocean via various feedback mechanisms. Ice sheet models need to be coupled with general circulation models (GCMs) in order to simulate the interactions between ice sheets, atmosphere and ocean. Due to the type of the ice dynamic equations, ice sheet models use coordinate systems different from GCMs, requiring a projection and regridding or interpolation step. These and other specific GCM-ISM coupling issues are addressed by OBLIMAP.
This collaboration aims to develop a parallel implementation of OBLIMAP’s fast scan method and will serve the near-future demand of being capable to couple ice sheet modes, which are based on adaptive grids with GCMs. A parallel implementation of OBLIMAP’s fast mapping will improve the on-line mapping performance, which is interesting for high resolution (< 1km) applications. While OBLIMAP is ready for the major step to achieve on-line coupling of an ISM within an ESM which has been a scientific goal for about 15 years now, the proposed parallel OBLIMAP release will significantly extend the number of more complex or high-resolution applications.
Read more about ESiWACE2
The Netherlands eScience Center and Data Archiving and Networked Services (DANS) have developed a new FAIR Software website for researchers. The website, which was officially launched during the National eScience Symposium on 21 November 2019, provides recommendations to researchers on ways to improve the quality and reproducibility of their software.
Research software has become a fundamental part of current research practice and is also considered an increasingly important research output. As such, there is a growing awareness of the importance of improving the quality of research software as well as the recognition research software engineers receive from producing such software.
FAIR software principles
The FAIR principles are meant to help researchers and developers improve the reusability and reproducibility of software by providing a set of guiding principles to make research software findable, accessible, interoperable and reusable (FAIR). The principles, which are increasingly viewed as a hallmark of excellent research, originated in the field of data management. Their applicability to software is currently a topic of active research.
The new FAIR Software website aims to encourage the greater adoption of FAIR principles by providing a set of starting recommendations that researchers can use to improve the quality, reach and reproducibility of their software. The website is a joint initiative between the eScience Center and DANS.
“Together with our partners at DANS, we as the eScience Center are proud to have been involved in making this website a reality”, says Rob van Nieuwpoort, acting director. “The website answers a growing need among researchers for a set of good practices that can guide them in developing software that is of excellent quality, is reproducible and can be widely reused by other researchers, potentially in different disciplines. I have no doubt the FAIR Software site will become a vital go-to reference for the research community.”
“It was wonderful to work with such a diverse group of people”, says Dr Carlos Martinez-Ortiz, eScience Research Engineer and closely involved in the development of the website. “FAIR is a very important trend in science and has mostly been applied to data. But software is also extremely important for science and how the FAIR principles can be applied to software is an important step in improving the quality of research software.”
Towards FAIR principles for research software
The launch of the website coincides with a recent position paper authored by a group of international researchers that summarizes the current status of the debate around FAIR and software as a basis for the development of community-agreed principles for FAIR research software. The authors discuss the differences between software and data with regard to the application of the FAIR principles. In addition, they present an analysis of where the existing principles can be applied to software directly, where they need to be adapted or reinterpreted, and where the definition of additional principles is required.
Read the full paper.