Together with researchers from the Bioinformatics Group of Wageningen University & Research and the University of California San Diego (USA), research software engineers at the Netherlands eScience Center designed and implemented an online collaborative platform called the “paired omics data platform” (https://pairedomicsdata.bioinformatics.nl/).
All biochemical interactions between living organisms, including bacteria, fungi, plants and animals are facilitated by specialized metabolites, also known as natural products. The knowledge about the enormous variety of such metabolites produced in nature is still far from complete, but it is vital in a wide range of health and life science research and key in the discovery of new drugs and other bioactive compounds. Further progress in metabolite research increasingly depends on the combined analysis of two kinds of data: genomic data and metabolomic data.
Genomic data contains information about the machinery of organisms to produce natural products. Metabolomic data, for example obtained by mass spectrometry, provides information about the actual presence of natural products in biological samples. Combining genomic and metabolomic data not only enhances the interpretation and annotation of both datatypes, it can also help identify specific producers of natural products within complex ecosystems.
A major bottleneck towards such combined genomic and metabolic data analysis is to have access to sufficient paired omics datasets, consisting of genomics and metabolomics data of the same sample. A large community of more than one hundred researchers from more than ten countries was involved in providing feedback to the design and in loading the platform with more than 4,800 paired data sets from various organisms and microbial communities.
Furthermore, more than one hundred validated links were established between biosynthetic gene clusters and the metabolite spectra of the structures of which they encode the production. Both the paired data and the validated links are already being used to develop novel algorithms to automatically link genetic and metabolic data, in order to accelerate the discovery of natural products.
The platform was announced on 15 February 2021 through a publication in Nature Chemical Biology.