A team of researchers, including research engineers from the Netherlands eScience Center, has developed a new open source software package that ranks and scores protein-protein interfaces (PPIs). Called iScore, the software package competes or even outperforms state-of-the-art protein scoring functions and could be generalized for a broad range of applications that involve the ranking of graphs. The software was announced in a recent paper in the journal Software X.
Interactions between proteins that lead to the formation of a three-dimensional (3D) complex is a crucial mechanism that underlies major biological activities in organisms ranging from immune defense system to enzyme catalysis. The 3D structure of such complexes provides fundamental insights on protein recognition mechanisms and protein functions.
The scoring problem
One way researchers in the field of molecular modeling try to predict the 3D structures of such complexes is by using computational docking, a tool that predicts the preferred orientation of one molecule to a second when bound to each other to form a stable complex. Despite its potential, however, a major drawback of computational docking is the scoring problem – the question on how to single out models that are likely to occur in real life experiment from the huge pool of generated docking models, in other words how to find a needle in a haystack.
‘The scoring problem has been a highly challenging task for decades.’
‘The scoring problem has been a highly challenging task for decades’, says Dr Nicolas Renaud, eScience Research Coordinator and member of the project team. ‘Over the years, many methods have been developed to overcome this problem. These can largely be grouped into five types: shape complementarity-based methods, physical energy-based methods, statistical potential-based methods, machine learning-based methods and coevolution-based methods. These different scoring approaches are regularly benchmarked against each other during a community-wide challenge: the Critical Assessment of Prediction of Interactions (CAPRI).’
To address the problem, the research team developed iScore. This novel kernel-based machine learning approach represents the interface of a protein complex as an interface graph, with the nodes being the interface residues and the edges connecting the residues in contact. By comparing the graph similarity between the query graph and the training graphs, iScore predicts how close the query graph is to the near-native model.
‘In a recent paper we demonstrated how iScore competes with, or even outperforms various state-of-the-art approaches on two independent test sets: the Docking Benchmark 5.0 set and the CAPRI score set’, says Renaud. ‘Using only a small number of features, iScore performs well compared with IRaPPA, the latest machine learning based scoring function, which exploits 91 features. This demonstrates the advantage of representing protein interfaces as graphs as compared to fixed-length feature vectors which discard information about the interaction topology.’
‘iScore offers a user-friendly solution to ranking PPIs more efficiently and more accurately than several similar scoring functions.’
According to the researchers, iScore offers a user-friendly experience thanks to dedicated workflows that fully automate the process of ranking PPIs. The software also allows to exploit large scale computer architecture by distributing the calculation across a large number of CPUs and GPUs. What’s more, although iScore has been developed specifically for ranking PPIs, the method is generic and could be used more generally for a broad range of applications that involve the ranking of graphs.
Renaud: ‘I am unbelievably proud of what the team has produced. iScore offers a user-friendly solution to ranking PPIs more efficiently and more accurately than several similar scoring functions. In addition, the software is open-source and freely available to use. I encourage researchers everywhere to give iScore a try and experience the benefits for themselves.’
N. Renaud, Y. Jung, V. Honavar, C. Geng, A. Bonvin, L. Xue, ‘iScore: An MPI supported software for ranking protein-protein docking models based on random walk graph kernel and support vector machines’ in Software X (January-June 2020). DOI: 10.1016/j.softx.2020.100462
Read the full paper
The eTEC 2020 call is aimed at domain researchers and ICT researchers working in the Netherlands who would like to apply for funding to address innovative compute-intensive and/or data-driven research problems. Its aim is to support the research and development of innovative eScience technologies and software associated with optimized data handling, data analytics and efficient computing, driven by a demand from any specified research discipline (a scientific or scholarly domain selected by the research team itself).
See the eTEC 2020 page for more information on the call, requirements, application process and deadlines.
The ASDI 2020 call is aimed at researchers who would like to carry out research projects focused on innovative domain research questions that are very hard or even impossible to investigate without the use of (advanced) eScience technologies and software. With ASDI 2020, the eScience Center intends to provide an impulse to all research endeavours in which the application of eScience tools and methodologies is relatively underdeveloped.
See the ASDI 2020 page for more information on the call, requirements, application process and deadlines
Funding and contact
Both eTEC 2020 and ASDI 2020 are funded by the Netherlands eScience Center and supported by the Netherlands Organisation for Scientific Research’s (NWO) Science Domain.
- Dr. Frank Seinstra (eScience Center): email@example.com or 020 460 4770
- Tom van Rens (NWO): firstname.lastname@example.org or 070 344 0509
As part of the festivities for King’s Day, royal honours are awarded annually to individuals who over the course of many years have served the public good or made a great contribution within their respective fields or professions. The honours are meant to recognise and thank them for their achievements.
A leading country in the area of high-performance computing
As a result of numerous recommendations, some by leading international researchers within the field of computing, it was decided this year to bestow royal honours on Dr Aerts. Explaining its decision, the selection committee mentions the influential role Dr Aerts played in making the Netherlands a leading example in the field of high-performance computing. Over the course of his distinguished academic and professional career, Dr Aerts has worked tirelessly to ensure the Dutch research community has access to cutting-edge computer facilities. He has also been a driving force behind the development of many national and international networks and was often the first Dutch representative in many such collaborative ventures. Examples are Advanced Research Computing, Academic Discussion Group Europe (ARCADE), (1995), the e-Infrastructures Reflection Group (e-IRG, he chaired in 2004-2005), PRACE (for HPC), EGI (for Grids), and after 2012 ePLAN (Platform of eScience Centers in the Netherlands and PLAN-E, Platform of National eScience Centers in Europe). Over the last few years, Dr Aerts has used his knowledge and passion to promote the sustainable development of research software, the use of FAIR principles for data and software within the research community ad the harmonisation of research data management planning.
‘Few individuals have been so influential in the development and promotion of digital research in the Netherlands as Patrick’, says Dr Joris van Eijnatten, director of the eScience Center. ‘He was one of the first Dutch researchers to understand the enormous possibilities offered by compute-intensive research and to ensure researchers gain access to these. His dedication laid the groundwork for many notable achievements over the past 20 years.’
‘Achievements are never an individual enterprise but rather the result of collective effort’, says Dr Aerts. ‘My contributions were made possible by many other people. In that sense, I view this great honour as a recognition of the work we have jointly carried out. I am immensely proud and thankful.’
About Patrick Aerts
Dr Aerts obtained his PhD in Quantum Chemistry cum laude from the University of Groningen (UG) in 1986. Between 1985 and 1990 he worked as a postdoctoral researcher at UG and a part-time support researcher for the ‘Working Group on the use of Supercomputers’. From 1990 to 2012, he was the director of the National Computing Facilities Foundation (NCF). Between 2012 and 2019, he worked as senior advisor strategic alliances at the eScience Center and senior research fellow at DANS.
Read more about Patrick Aerts
The Netherlands eScience Center has awarded funding to four innovative research proposals related to the Covid-19 pandemic. The selected projects will run between 3 and 12 months and receive in-kind research engineering support. With these projects, the eScience Center will use its expertise in research software development to help address the current pandemic.
The projects span several domains and focus on monitoring and analysing public sentiment to government measures and announcements, the relation between COVID-19 and heart disease, the development of a tailored model to inform public health interventions for infection diseases in the Netherlands, and the refinement of a platform to enable the development and deployment of machine learning algorithms for the automated scoring of CT scans to detect and assess the severity of COVID-19.
‘I am extremely pleased with the quality and depth of the proposals we’ve received’, says Dr. Frank Seinstra, the eScience Center’s program director. ‘The projects are interdisciplinary in nature and allow us to use our full range of expertise for a clear and urgent goal. As the national center for the development and application of research software, our engineers have the skills and the tools to help accelerate novel research outcomes. In that sense, I am especially proud of the speed and dedication with which our engineers have worked to prepare and present such excellent proposals in such a short space of time. The crisis demands quick, decisive and concerted action from the entire research community.’
The projects will kick-off as soon as possible. To stay informed on the latest results, please visit the eScience Center projects page.
Real Time National Policy Adjustment and Evaluation on the Basis of a Computational Model for COVID-19 (Retina COVID19)
Principal Investigator: Prof. Martin Bootsma (UMC Utrecht)
The current COVID-19 pandemic presents an unprecedented challenge for policymakers. Although the major consequences from the uninhibited spread of the COVID-19 virus in Western European countries have abated due to far reaching social distancing measures, these measures carry enormous economic and social costs. Furthermore, basic epidemiological mechanics dictate that some form of containment policy will be necessary for the foreseeable future in order to prevent a recurrent outbreak and keep the impact of COVID-19 manageable. The challenge then is to design public policy interventions informed by epidemiological models. However, these models suffer from what has been termed in other fields the curse of locality – while the basic biology of the virus is the same everywhere, the outcomes will differ according to the local circumstances. For example, the host population in each country is different, societal norms and customs vary and spatial patterns governing movement of people in their daily lives differ. This means that Dutch policy must be informed by a model that is tailored to circumstances in the Netherlands. In this project, work will continue on developing an epidemiological model that can be used to inform public health interventions in the Netherlands.
Research team: Prof. Marc Bonten (UMC Utrecht), Prof. Jason Frank (UU), Prof. Mirjam Kretzschmar (UMC Utrecht, RIVM)
eScience Research Engineers: Inti Pelupessy, Lourens Veen, Ben van Werkhoven, Rena Bakhshi
FAIR Data for CAPACITY
Principal Investigator: Andre Dekker (Maastricht University, Personal Health Train – PHT)
Diagnostic information and data on occurrence of cardiovascular complications in COVID-19 patients are rapidly growing but distributed over different clinical locations. In order to provide the most accurate insights about the relation between cardiovascular history and related complications in COVID-19 patients, statistical analyses and machine learning models need to be kept up to date in real time. This will not be possible by continuously collecting data manually from different locations. This project will build FAIR data stations and automatic data extraction pipelines for defined sets of clinical data as part of a distributed learning infrastructure. This will provide insight into the incidence of cardiovascular complications in patients with COVID-19 and into the vulnerability and clinical course of COVID-19 in patients with an underlying cardiovascular disease.
Research Team: Rick van Nuland (Lygature, HealthRI), Folkert Asselbergs (UMC Utrecht, Dutch Cardiovascular Alliance – DCVA), Mira Staphorst (Hartstichting, DCVA)
eScience Research Engineers: Djura Smits, Lars Ridder
COVID-19 Grand Challenge
Principal Investigator: Dr James Meakin (Radboud UMC)
Diagnostic imaging with computed tomography (CT) and chest X-ray are proving increasingly important in detecting and assessing disease severity of COVID-19. To aid in the clear communication between radiologists and clinicians, the Radiological Society of the Netherlands (NVvR) has proposed a standardised reporting system, CO-RADS, for assessing the suspicion and severity of COVID-19 in a CT scan. In this project, an existing platform, grand-challenge.org (with elements of the the EYRA benchmark platform, where suitable), will be furthered to enable the development and deployment of machine learning algorithms for automated scoring of CT scans using the CO-RADS system.
Research team: Paul Gerke, Mike Overkamp and Miriam Groeneveld (Radboud UMC), Prof. Bram van Ginneken (Radboud UMC)
eScience Research Engineers: Maarten van Meersbergen, Pushpanjali Pawar, Jesus Garcia González
Dutch Public Reaction on Governmental COVID-19 Measures and Announcements (PuReGoMe)
Principal investigator: Shihan Wang (Utrecht University)
Public sentiment (the opinion, attitude or feeling that the public expresses) always attracts the attention of government, as it directly influences the implementation of policies. In the current pandemic, the timely understanding of general public opinion becomes even more important. However, the ‘stay-at-home’ policy makes face-to-face interactions and interviews challenging. Meanwhile, about 2.8 million users in the Netherlands use Twitter to share their opinions, making it a valuable platform for tracking and analysing public sentiment. To understand the variation of Dutch public sentiment during the COVID-19 outbreak period, this project will analyse real-time Twitter data using machine learning and natural language processing approaches. Data collection will be based on COVID-19 related keywords and users. The aim is to provide a cost-effective and efficient way to access public reactions in a timely manner. For instance, instead of waiting for physical behaviours (like taking a walk outside) of people, the latter’s sentiment and intended behaviour could already be gleaned from Twitter data.
Research team: Marijn Schraagen, Mehdi Dastani
eScience Research Engineer: Erik Tjong Kim Sang
The Netherlands eScience Center has released a new version of McFly, its highly popular software package that helps researchers to find a suitable neural network configuration to carry out deep learning on time series data. The latest version includes several new features and is now freely available to download.
Deep learning is a popular machine learning method that trains a computer to perform human-like tasks such as recognising speech or classifying images. Unlike other machine learning methods that organise data to run through predefined equations, deep learning sets up basic parameters about the data and trains the computer to learn on its own by recognising patterns in data using many layers of processing.
Its popularity notwithstanding, designing a deep learning network can be difficult as it requires users to choose, for example, the number of layers in the network, the number of nodes in each layer and the type of each layer. Moreover, each network must be calibrated or trained before it can be used to automatically classify data.
The Netherlands eScience Center started developing McFly in 2016 to aid researchers working with time series data. A time series is a series of data points indexed in time order such as activity logs, heights of ocean tides or the daily closing of the Amsterdam Exchange Index. Time series are used, for example, in statistics, weather forecasting, mathematical finance and astronomy.
‘Although there are tools that provide pretrained deep learning models for computer vision tasks, no such model existed for time series data, which are widely used by researchers’, says Dafne van Kuppevelt, eScience research engineer and part of the McFly development team. ‘We realised that many researchers were being hampered from using deep learning by the considerable knowledge required to train a deep network – the exact same knowledge available at the eScience Center.’
McFly simplifies the process by making explicit the steps that are required to train a model while offering useful default values at each step. It then tries out different network configurations, training each one on the data provided by the user before listing the performance of each network along with a visualisation that helps the user judge its tendency to overfit or underfit the data.
Fellow development team member Christiaan Meijer: ‘We wanted to help researchers who have never trained deep networks as well as those who know enough to train a neural network but would like to find a suitable network and hyperparameters, something that is often repeated for every new dataset or research question and is automated in McFly.’
Faster and more flexible
The latest version of McFly features a number of new network architectures that it can generate automatically. This makes it more likely to find a suitable deep learning model type for a given data set. It uses a new underlying Tensorflow version, which is an open source machine learning platform for Python.
‘McFly is to the best our knowledge the only open source tool for deep learning on time series data aimed at novices in machine learning’, Meijer adds. ‘Its main value is to provide an environment to apply deep learning classification of time series data quickly, thereby relieving the user from the technicalities of deep neural networks architectures and training. We recently evaluated the new version in two user workshops and the response was extremely positive. I am really proud of what our team has produced.’
The members of the McFly development team are (in alphabetical order): Sonja Georgievska, Vincent van Hees, Florian Huber, Dafne van Kuppevelt, Christiaan Meijer and Atze van der Ploeg
The Netherlands eScience Center has appointed Monique van der Linden as director of Operations, effective 1 June. Van der Linden succeeds Prof. Rob van Nieuwpoort, who has been filling this role in an acting capacity alongside his regular position as director of Technology.
As head of Operations, Van der Linden will be responsible for the operational management of the eScience Center. Her remit will include finance and control, human resources and facility services. In addition, she will be a member of the eScience Center Board of Directors.
“I am extremely pleased that Monique has decided to join the eScience Center”, says Dr. Joris van Eijnatten, general director. “Besides being a highly capable administrator, Monique has extensive experience in academia. She knows the environment like few others and understands what is needed to help our organization grow sustainably over the coming years and fulfill its mission to enable digitally enhanced research across the Netherlands.”
Van der Linden: “I have worked almost my entire professional life in a university environment, a place that motivates and inspires me. I find it extremely rewarding to work with researchers and to create the optimal operational circumstances for them to thrive and carry out their research. I am excited about the prospect of putting my experience to use in such a dynamic organization. The impact of digital technologies on research is already substantial and will only become greater in future. I feel privileged to play a part in this important endeavour.”
About Monique van der Linden
Monique van der Linden was appointed head of Operations at Utrecht University’s Faculty of Social and Behavioural Sciences in 2018. Prior to this, she held various administrative posts at UU and was, among other things, head of the Faculty of Humanities’ Research Support Office.
Van der Linden holds a Master’s in International Business from Maastricht University.
The Netherlands eScience Center has made funds available for eScience projects focused on research questions related to the COVID-19 outbreak. The funds will consist of in-kind software engineering support. External researchers will be requested to submit project ideas in a collaboration with experts employed by the eScience Center.
“While many relevant projects may be found in the Life Sciences, Biology and Chemistry, we also encourage partners in other domains, including the Social Sciences, Logistics and Environmental Sciences to apply if they have a relevant and urgent research question”, says Dr Frank Seinstra, the eScience Center’s Program Director. “With these funds we hope to put our expertise to use in helping to address this crisis.”
Project partners will have until 9 April to submit their proposals. The winning projects will be announced by the eScience Center on 15 April.
The Netherlands eScience Center has decided to cancel this year’s edition of its National eScience Symposium. This decision was made following the latest developments in the COVID-19 outbreak.
“Obviously, it is a huge disappointment for all of us: the participants, the organizing team and everyone involved in the event. The eScience Symposium has become a regular fixture for many researchers and knowledge institutes across the country. However, like all major events, we believe it would be difficult and inappropriate to organize the symposium under the current circumstances”, says Dr Joris van Eijnatten, general director of the eScience Center.
As a result, the next edition of the eScience Symposium will take place in the second half of 2021.
Principal Investigator Day
Earlier this year it was decided to integrate the Principal Investigator Day (PI Day) in the eScience Symposium. This annual one-day event, which brings together principal investigators from all of the projects in which the eScience Center collaborates, will hereby get a new format. More on this will be communicated in due time.
A new version of the Earth System Model Evaluation Tool (ESMValTool) was recently made available to the scientific community. The release was announced by the ESMValTool development team, which includes the Netherlands eScience Center, in the first of four papers to be published in the journal Geoscientific Model Development. The latest version features dramatic improvements in user-friendliness, reliability and performance, as well as several new and improved features.
The Earth System Model Evaluation Tool (ESMValTool) is a community-based diagnostics and performance metrics tool for evaluating Earth System Models (ESM). It allows for routine comparison of single or multiple models, either against previous versions or against observations. The tool targets specific scientific themes and focuses on selected essential climate variables such as tropical climate variability, monsoons, atmospheric CO2 budgets, and tropospheric and stratospheric ozone. Its aim is to facilitate the analysis of data produced by earth system modelling groups within the Coupled Model Intercomparison Project (CMIP).
Keeping pace with advances in earth system modeling
Since the release of the ESMValTool in 2016, steady progress has been made in the field of earth system modelling, with future ESM experiments now expected to challenge the scientific community with an increasing volume of data that will need to be analysed, evaluated and interpreted. Moreover, the models’ higher spatial and temporal resolution combined with the growth of their scientific themes and complexity are making it difficult to carry out refined data analysis.
“A major bottleneck of the ESMValtool 1 was the relatively inefficient preprocessing of the input data, leading to long computational times for running analyses and diagnostics whenever a large data volume needed to be processed”, says Bouwe Andela, eScience Research Engineer and a member of the ESMValTool development team. “A significant part of this preprocessing consists of common operations, which are performed on the input data before a specific scientific analysis is started. Ideally, these operations – collectively called preprocessing – should be centralized in the tool. This wasn’t the case with ESMValTool 1, where only a few of these preprocessing operations were performed in such a centralized way. This resulted in several drawbacks such as slow performance, code duplication, lack of consistency among the different approaches implemented at the diagnostic level, and unclear documentation.”
Faster and more user-friendly
The ESMValTool 2.0 was developed to address this bottleneck and features a new design with an improved interface and a revised preprocessor. Moreover, it includes a significantly enhanced diagnostic part and specifically targets the increased data volume and the related challenges posed by the analysis and the evaluation of output from multiple high-resolution or complex ESMs.
“The new version takes advantage of state-of-the-art computational libraries and methods to deploy efficient and user-friendly data processing. Common operations on the input data are centralized in a highly optimized preprocessor, which allows for applying a series of preprocessing functions before diagnostics scripts are applied for in-depth scientific analysis of the model output”, says Andela.
Performance tests conducted on a set of standard diagnostics also show that the new version is faster than its predecessor by about a factor of 30, depending on the hardware used. The development team was able to increase performance by introducing task-based parallelization options and expects to improve performance even further in the coming months.
Andela: “I am really proud of what we as a team have produced. The latest version offers major improvements to both users and developers and allows for even more refined diagnostics and evaluation of earth system models. In the coming period, we’ll continue to build on this version and introduce new features.”
Righi, M., Andela, B., Eyring, V., Lauer, A., Predoi, V., Schlund, M., Vegas-Regidor, J., Bock, L., Brötz, B., de Mora, L., Diblen, F., Dreyer, L., Drost, N., Earnshaw, P., Hassler, B., Koldunov, N., Little, B., Loosveldt Tomas, S., and Zimmermann, K.: Earth System Model Evaluation Tool (ESMValTool) v2.0 – technical overview, Geoscientific Model Development, 13, 1179–1199, https://doi.org/10.5194/gmd-13-1179-2020, 2020.
Nowhere is climate change more obvious than in the Arctic. Estimates show that over the past few decades the region has warmed twice as fast as the global average, a phenomenon known as Arctic Amplification (AA). Although the exact causes for this phenomenon is still being debated, most researchers agree that poleward energy transport is accelerating the warming process.
In the collaborative ERC project Blue-Action, coordinated by the Danish Meteorological Institute and involving 41 partners, including the Netherlands eScience Center, the aim is to improve our ability to describe, model and predict Arctic climate change and its impact on Northern Hemisphere climate, as well as to deliver valuated climate services for societal benefit.
As one of the project partners, the eScience Center is working on improving climate change forecasts in the Arctic by calculating the meridional energy transport (MET) both in the atmosphere and ocean and making these refined calculations available to researchers. Yang Liu, research engineer at the eScience Center and a PhD researcher at Wageningen University & Research, is closely involved in the project and recently published a joint paper on the outcome of reanalysis data sets used to compute energy transport. The paper was published in the journal Earth System Dynamics.
“In order to make use of available observations and advanced numerical models, we created an open source work package in Python, in which we combined six state-of-the-art data sets”, Liu explains. “These comprised three atmosphere reanalysis data sets and three ocean reanalysis data sets and had a high temporal and spatial resolution, making them extremely suitable for the computation of energy transport.”
The aim, says Liu, was to quantify and intercompare the Atmosphere Meridional Energy Transport (AMET) and the Ocean Meridional Energy Transport (OMET) variability between the different reanalysis data sets. “We know from previous and current studies that there is a strong relationship between meridional energy transport and fluctuations in sea ice. Our objective was therefore to compare a number of previous studies and see how their respective data sets compare with each other and correspond to the latest observation and modeling techniques.”
Arctic sensitivity to variations
After collecting the reanalysis data sets, Yang and his fellow research engineers at the eScience Center calculated the AMET and OMET on Cartesius, the Dutch National Supercomputer. What they found was that the data sets generally agree on the average AMET and OMET in the Northern Hemisphere, and OMET is consistent with the observational results achieved over the past twenty years. Nevertheless, the team’s analysis did reveal anomalies at interannual time scales, with the data sets clearly differing from each other both spatially and temporally. “When compared over a longer period, the data sets are inconsistent in their long-term analysis and do not match up with the latest climate models”, says Liu. “What the data sets do clearly show, however, is that the Arctic climate is quite sensitive to seasonal variations in atmospheric and ocean energy transport.”
The research team has now made their work package and the reanalysis products available to the wider scientific community. Liu: “Although the reanalysis data sets are not specifically designed for studies on energy transport, they can still be of much use for energy transport diagnostics. In the coming period, we will continue to refine the work package and explore the role of energy transport in sea ice forecasts. Given the close relation between sea ice variation and energy transports, it is promising to improve the Arctic sea ice forecasts with a better estimation of energy transport, in combination with novel machine learning techniques.”
Liu, Y., Attema, J., Moat, B., and Hazeleger, W.: ‘Synthesis and evaluation of historical meridional heat transport from midlatitudes towards the Arctic’ in Earth System Dynamics, 11, 77–96, https://doi.org/10.5194/esd-11-77-2020, 2020.