Modern plant research, as other disciplines in biology, isgradually being transformed to a data-driven endeavor. One of the main drivers of this development is the continuous reduction in DNA sequencing costs; reconstructing the complete genome of a plant from short DNA sequences or finding genetic variants with respect to a reference genome are applications where large amounts of sequencing data are generated and applied to study plants and to accelerate and improve breeding.

Traditional approaches to compare genomes, centered on a single reference, no longer suffice and therefore the field of genomicsis switching to so-called pangenome approaches. Several novel graph-based data structures and algorithms are under development, but none of these can handle the numbers of large plant genomes required in modern research and in applications in plant breeding.

In this project, we will improve the scalability of a promising pangenome approach, called PanTools, using eScience technologies. We will specifically address bottlenecks in pangenome construction and analytics, based on a number of predefined use cases in plant genomics. Major performance improvements are expected from the integration of Spark technology and our sophisticated graph-based pangenome. This project will deliver the first pangenome approach that can handle the big data in plant genomics and will drastically improve the analytical power on plant data.


Wageningen University and Research
Triangles 2

Let’s collaborate

Are you a researcher who could benefit from our eScience skills and experience? Reach out to us today and let’s explore how we can work together.


Stay abreast of our latest news, events and funding opportunities

  • Dit veld is bedoeld voor validatiedoeleinden en moet niet worden gewijzigd.