The projects are intended to provide the opportunity to rapidly meet short-term scientific challenges, serve as a pilot for future research projects, address immediate technological goals, or investigate the potential to initiate larger projects.
The four projects result from the April Open Call for Path-Finding Projects. Path-Finding Project proposals can be submitted at any time. NLeSC funds projects by the direct provision of cash and the in kind provision of eScience Research Engineers.
Massive biological data clustering, reporting and visualization tools
Dr. Vincent Robert
CBS-KNAW Fungal Biodiversity Centre
With the availability of newer and cheaper sequencing methods, genomic data are being generated at an increasingly fast pace. In spite of the high degree of complexity of currently available search routines, the massive number of sequences available virtually prohibits quick and correct identification of large groups of sequences sharing common traits. There is a need for clustering tools for automatic knowledge extraction enabling the curation of large-scale databases. In this project a clustering tool will be developed which avoids a majority of sequence comparisons and significantly reduces the total runtime for clustering while retaining the accuracy of clustering.
Compressing the sky into a large collection of statistical models
Prof. Martin Kersten
Time-domain astronomy opens up a new era of observational astronomy, covering the spectrum from radio and millimeter to optical wavelengths. The data avalanches from their instruments forces us to overhaul contemporary data storage, data management, and data analytics techniques. Rather than endlessly piling observations onto fleets of hard drives, this project aims to replace raw observations with more compact model-based representations founded in astrophysics. A well-fitting model has the potential of reducing the storage footprint by several orders of magnitude. The result should still be easy to query and amendable for further analysis within a priori known statistical bounds.
Mining Shifting Concepts through Time (ShiCo): Word Vector Text Mining Change and Continuity in Conceptual History
Prof. Joris van Eijnatten
Digital humanities have achieved impressive progress in tracing and mapping historical events and actors as well as past relations between actors and events. This project aims to go beyond these capabilities by establishing the structures of interpretation that emerge around these historical events, and the subsequent formation of collective meanings. The goal of this project is to develop a repurposable tool that enables humanities researchers to mine the historical development of concepts and the vocabulary with which they are expressed in big textual data repositories.
Giving pandas a ROOT to chew on: Modern Big Data front and backends in the hunt for Dark Matter
Dr. Christopher Tunnell
Dark matter is the “Wild West” of physics research; in the coming few years, researchers aim to discover what most of our Universe’s matter consists of. There is a tension between our novel Big Data solutions and the existing methods used in Big Science (e.g., Large Hadron Collider experiments). This project presents a way to harmonize these two ecosystems. The goal is to organize software and data such that researchers can work with existing particle physics infrastructure, yet still use modern communal Big Data tools. A new computing model will be prototyped for small-to-mediumsized particle physics experiments, and the barrier for large experiments to benefit from advances in modern data analytics will be lowered. In addition to helping researchers discover dark matter interactions, this project will help shifting particle physics toward non-domain-specific codes.