The microbes in our bodies are fundamental to our health. At the molecular level, many of their interactions with human tissues are mediated by microbial specialized metabolites.
While metabolomics provides a powerful technique to profile these, most microbial molecules have unknown structures; hence, over 95% of detected masses cannot be functionally interpreted or linked to their producers. This currently thwarts efforts to understand important diseased states of our microbiome.
Many innovative computational workflows have recently been designed to predict molecular (sub)structures from genomic or metabolomic data; however, these efforts have remained largely unconnected. Integrating these data will make it possible to complement partial information provided by each field to yield much better functional predictions.
Moreover, it will connect vital information from both data types: while metabolomics informs about in vivo relevance, genomics informs about biological origin. Here, we propose to design a novel algorithm to connect molecular substructures identified in tandem mass-spectrometric data to sets of genes within biosynthetic gene clusters (BGCs) detected in (meta)genomic data. Subsequently, we will integrate this algorithm with our previous methods for metabolome (spectral networking, substructure detection) and genome analysis (BGC identification and clustering) in one comprehensive eScience workflow.
Finally, we will demonstrate its potential by identifying molecules prominent during periods of relapse in a longitudinal study of inflammatory bowel disease (IBD) and connecting them to their producers. Ultimately, our workflow will illuminate the vast unknown metabolic space within the human microbial metabolome, and greatly advance our understanding of molecular mechanisms of health and disease.