Willem´s main research topics in the past 10 years are semantics, augmented sense making, visual analytics, information integration, and text mining.Profile page
This expertise area includes
Structured and unstructured data. From data to information to knowledge to insight. Current research challenges demand robust and reliable methods to identify the patterns and relationships contained in, but also obscured by, large amounts of disparate data.
eScience approaches can enable researchers to recognize sources of relevant information, prepare raw data, use statistical tools, extract meaningful information, recognize potential problems and make visualizations to communicate their findings.
With the application of statistics and applied mathematics at its core, the use of data-analytics and visualisation are generic requirements for many scientists. Combining ‘big data’ with theory and conceptual models will enable scientists to structure the wealth of data and provide skilful forecasts.
Example: Natural Language Processing
Sources of natural human language, such as emails, web pages, tweets, product descriptions, newspaper stories, social media and scientific articles are a central feature of the so-called Big Data explosion. Within these various media is a wealth of information, connections, patterns and hidden knowledge of academic, social and commercial value. Add to this the volume of digitized historical records and texts, covering thousands of languages, formats and varieties and the potential to unlock new insights becomes almost limitless but also hugely complicated and challenging.
The science of analyzing human language is natural language processing (NLP) and its applications are already part of our everyday lives. Spelling and grammar correction in word processors, translation tools on the web, email spam detection and automatic question answering are all forms of NLP.
In humanities research (our focus for NLP applications) a number of newer NLP challenges exist. These include detecting people’s opinions (sentiment analysis), producing readable summaries of chosen text, identifying the discourse structure of connected text, identifying the relationships among named entities and selecting the correct meaning of words which intrinsically have multiple meanings.