Scoring 3D protein-protein interaction models using deep learning

Scoring 3D protein-protein interaction models using deep learning

Interactions between biomolecules control all cellular processes. Understanding those interactions requires adding a three dimensional structural dimension. Next to experimental structural biology techniques, this can be done by docking, a complementary and high-throughput computational method allowing to model complexes from their known components.

A challenge in docking is scoring – the identification of correct (near-native) models from a large pool of docked models – due to our still limited knowledge of interaction rules. We will tackle this challenge by training deep networks (dNNs) to learn complex interaction patterns from the huge amount of experimental data in the Protein Data Bank (a valuable source of information not yet fully exploited). Our innovative strategy is to treat this problem as a 3D image classification problem: The interfaces of docked models will be represented as 3D images and dNNs will be trained to classify whether they are near-native or not. Unlike other machine learning techniques, dNNs are now able to learn from millions of data without reaching a performance plateau quickly, which is computationally tractable by harvesting GPU and Hadoop technologies.

The resulting scoring function, DeepRank, will markedly enhance our capability to reliably model biomolecular complexes, assisting the scientific community to gain insights into macromolecular aspects of life. It will be implemented in our HADDOCK modelling platform and freely distributed through GitHub and eStep repositories, ensuring a wide dissemination. The impact will be broad since 3D image-based dNNs have applications in many other domains, such as medical diagnoses (MRI), cryo-electron microscopy and computer vision.

Co-applicant: Dr. Li Xue (Utrecht University)

Image by: NIH Image Gallery

eScience Research Engineer Dr. Sonja Georgievska

Sonja joined NLeSC in May 2015. She is an eScience Research Engineer on the project Massive Biological Data Clustering, Reporting and Visualization Tools.

Profile page
eScience Coordinator Dr. Lars Ridder

Lars’ research interests cover (bio)chemical informatics and simulations. He is responsible as engineer and project coordinator for multiple projects in the life-sciences and chemistry domains.

Profile page

Stay up to date, sign up for our newsletter