Mining historical data

Facilitating and supporting large-scale text mining in the field of digital humanities

humanities and social sciences



Project Highlights

Strengthening the employment of computational methods in humanities research

Demonstrating ways in which computational humanities can be integrated into conventional historical-interpretive approaches

Facilitating and supporting large-scale text mining in the field of digital humanities

When addressing macro-historical questions, such as the emergence of transnational reference cultures, cultural text mining is of crucial importance. The mining of cultural aspects of entities and events in large textual repositories, such as the collection of digitized humanities newspapers provided by the National Library of the Netherlands (KB), can provide valuable insights. The ‘Translantis: Digital Humanities Approaches to Reference Cultures’ program uses text mining technologies to analyze Big Data repositories of public media. The eScience challenge here is to develop a tool for cultural text mining that enables scholars to systematically search very large quantities of textual data in a reliable and reproducible way.

The Intelligent System Lab Amsterdam (ISLA) has developed a scalable open source text analysis service, xTAS, coupled to Elasticsearch, the scalable open source search and analytics platform, which underlies the Texcavator application that serves the specific text mining needs of the Translantis research team. xTAS will be further developed and future versions will include clustering concepts and sentiment mining of issues in public debates. Incorporating regular feedback loops will allow for iterative refinement of the analysis algorithm and extension of the current set of features.

This project aims to significantly strengthen the employment of computational methods in humanities research. Users working in interdisciplinary teams with the current tool (Texcavator) will be closely monitored to study what interface functionalities and features are desired and needed. The goal is to build a generic tool that enables fine-grained analysis of large-scale document collections, and that offers state-of-the-art visualizations to enable humanities scholars working in multidisciplinary teams to semi-automatically distinguish long-term patterns in large news media repositories. 

The result will be an innovative text mining tool that is user-friendly and sustainable. Also, this project will result in a number of best practices demonstrating ways in which computational humanities can be integrated into conventional historical-interpretive approaches and vice versa. The software developed is open source and will be available to humanities scholars and social scientists to deploy in their own research.

Image: Koninklijke Bibliotheek (CC License)

Project Leader Dr. Jaap Verheul

Dr. Jaap Verheul is associate professor of cultural history and director of the American Studies program at Utrecht University, the Netherlands.

External page
Project Leader Prof. Toine Pieters

You can find Professor Toine Pieters on the fertile grounds where the life sciences meet the humanities and social sciences – with a background in pharmacology and a PhD in Social Studies of Science he teaches history of the life sciences at Utrecht University.

External page
eScience Research Engineer Dr. Janneke van der Zwaan

Janneke works as an eScience Research Engineer on the Texcavator and From Sentiment Mining to Mining Embodied Emotions projects.

Profile page

Stay up to date, sign up for our newsletter