On Monday 30 January our team of eScience Research Engineers and eScience Principal Investigators came together to talk about shared challenges across different research disciplines. View the photo album on Flickr.
What do ecology, psychology and medical science have in common? And what digital technologies do we need to engage in those shared challenges, how do we bridge gaps between disciplines, and how do we combine expertise from different domains with expertise from computer and data science?
Starting the day with a talkshow illustrated the potential as well as the difficulty of bringing such diverse fields of research together. Ecologists and meteorologists discussed the possibilities of sharing data. Communication challenges also quickly became apparent when psychologists and computer scientists started talking about language (“are we talking about human language or programming language?”) and different interpretations of concepts (“what is the difference between text mining and natural language processing?”).
It shows the value of organizing a day like this, where we can try to work on shared understandings, break the jargon barrier, and inspire each other with unexpected perspectives.
After many discussions, six topics were identified as relevant for a follow-up colloquium:
1. Deep Learning in Science
2. Data analytics
4. Multi-scale modelling
5. Data integration
6. Tools and access to e-infrastructure
The topics were discussed in in-depth sessions, after which each in-depth group presented a short pitch on why the eScience Center should organize a colloquium on this topic. While each topic is absolutely worth a colloquium, the most suitable topic was judged to be Visualization. Sign up for our mailing list to stay up to date on this colloquium!
Continue reading if you are interested in what was said during the six pitches!
eScience Technical Lead Willem van Hage pitches for deep learning in science
Machine learning is a fast-growing and exciting field in research, and deep learningrepresents its state-of-the-art. Machine Learning and Deep Learning involve feeding a computer system a lot of data, which it can use to make decisions about other data. Deep Learning enables many researchers to scale up their machine learning in ways they couldn’t do before.
That opens new avenues to ask new questions in many fields. When there are too many features to pre-code in your models, and you want to explore the field, Deep Learning saves you time for the creative part of science instead of spending too much time on the mundane technical data.
What do we need to make progress with the application of Deep Learning in scientific research? A great start would be symposium for various fields of science that can profit from Deep Learning. Because it’s so fast, and so new, it’s very hard to stay up to date with current developments. You need to discuss new technological developments, but also new applications. It would be very useful to share prototype implementations in various fields so you can see commonalities and differences. And to share what is the best learning material to get you started.
We need MOOCS and not books – we need online material that can change as the field changes. And we need access and examples of how to gain access to infrastructure with specific support – for example with GPU clusters but also with high-speed networks.
eScience Research Engineers and eScience Principal Investigators talking about the future of data analytics
We use models to make predictions about the future. For example, to fight or prevent poverty, or to detect where slums are evolving in cities. To develop and analyze these models we need a set of tools.
However, there are so many methods available that we are blinded by the complexity. And we do not always know what the real truth is – because what is the truth in the future? We simply cannot evaluate these methods, and do not know which is the best one. We have so many different fields of expertise, but there is a gap between those fields which is preventing us from combining methods it in the best way.
A course of action could be to start bringing those diverse fields of expertise together, to develop the communication between those fields. Not only between technical and scientific aspects but also between these and methodological aspects. We need communication beyond documented coding – collaborating with each other, also on an international scale, and with companies.
eScience Coordinator Adriënne Mendrik pitches for visualization
Visualization is a way to simplify complex data and make it more attractive to people, and therefore very important not only to go from data to information but also to inspire.
A big challenge is for domain researchers to ‘trust’ the visualization. In any visualization choices are made to translate a set of data into a visualization that is more easily interpreted – inherently a process in which data is manipulated.
It is important that people are educated so they understand how visualizations come about. At the same time, it is important that researchers realize what is possible by using visualization as a tool – because there is so much potential.
Visualization is very difficult to generalize. Each research question requires a different kind of visualization. To get the most out of this technology, it is therefore important that researchers are aware of the possibilities. That enables researchers to communicate their wishes, and makes it easier for computer scientists to understand those wishes.
A symposium would be a great way to show domain scientists the potential of translating their complex data into a ‘simplified’ visualization that helps to interpret the data and inspire other researchers.
eScience Coordinator Lars Ridder pitching for multi-scale modelling
Going from very small level to a very high level: from cells to society, from butterflies to global climate. From the Universe to a screen. Multi-scale modelling is trending.
Why do we need multi-scale modelling? One reason is because we want to take short-cuts: We cannot compute whole systems at the lowest level of detail. Another reason is that we have data at all different levels now. This is new. And we have compute systems at different scales.
There’s a good case for multi-scale modelling. The questions we asked ourselves is: Are there generic aspects that are true for multi-scale modelling in all the different domains? For example, can we find generic rules for how to separate different scales? Can we define best practices for the interfaces between the different scales? Do we have ways of validation for these complex multi-scale models? Do we know how to map the multi-scale models to a multi-scale complex compute infrastructure?
There are many more questions than answers at this moment. That’s why we really need a workshop to work this out. We want to bring together scientists from different domains to sit together and see what the properties of their multi-scale models are and extract generic aspects that we can solve as eScience Center.
eScience Principal Investigator Chris de Graaf pitching for data integration
Data integration allows users to see a unified view of heterogeneous data. It involves combining data from several disparate sources, which are stored using various technologies. Data integration is becoming essential to do science.
There are four things to consider: different formats, different modalities of data, ontology (a set of concepts and categories in a subject area), and also linked to that epistemology (different ontologies in different communities).
Data integration challenges can be illustrated by the following two ‘billion dollar research questions’: 1) How can you use allele and gene information to predict the size and also the robustness of crops or plants? 2) How can we combine biological activity and chemical structure information to predict polypharmacological action of drug molecules on multiple protein targets? We tried to find some common themes between the research questions
We realized initially that we have different problems. Namely, for the polypharmacological question we had the problem that there were certain data integration tools not available that we need to invest in, while for the crop prediction the whole infrastructure and ontology has to be developed and there are a lot of epistemological questions to deal with. We realized when moving to a cell based system, polypharmacological questions would have a similar problem in terms of infrastructure and ontology.
That’s why we need a symposium where we want to have case studies presented that reflect not only the challenges of this essential work but also the success stories of the science to convince people of the urgency of tackling this challnge – based on the different aspects of data integration we will come to discussion groups. And then in the end, if we get to speak the same language on either one of those elements, that will be a big success.
eScience Research Engineers and eScience Principal Investigators talking about streaming data
The case for a symposium on streaming data is very compelling. Imagine, you have an up to date view of all your data as it streams in. The participants in this session all shared a dream. All work with networks of sensors – be they wearables, weather buoys to collect weather and ocean data for climate research or antennas to study the universe. And it turns out that we need to study the data of these sensors continuously as they come in. We found out that we need to identify which of this real-time processing can be done by software and which by hardware. Sometimes there is so much data that you need to get it reduced in seconds.
Some of it you can run on commodity hardware, others need specialty hardware. But we, the users, don’t want to know. We need this hardware to be fault-tolerant, sustainable, and we need it to have reasoning build in. Therefore, we need a one-day workshop in which we are going to build enthusiasm for a flagship project on stream reasoning.
Photography by Elodie Burrillon, HUCOPIX