This project studies how genres in newspapers and television news can be detected automatically using machine learning in a transparent manner. This will enable us to capture the often hypothesized but, due to the highly time consuming nature of manual content analysis, largely understudied shift from opinion-based to fact-centred reporting. Moreover, we will open the black box of machine learning by comparing, predicting and visualizing the effects of applying various algorithms on heterogeneous data with varying quality and genre features that shift over time. This will enable scholars to do large-scale analyses of historic texts and other media types as well as critically evaluate the methodological effects of various machine learning approaches.

This project brings together expertise of journalism history scholars (RUG), specialists in data modelling, integration and analysis (CWI), digital collection experts (KB & NISV) and e-science engineers (eScience Center). It will first use a big manually annotated dataset (VIDI-project PI) to develop a transparent and reproducible approach to train an automatic classifier. Building upon this, the project will generate three outcomes: 

  1. A study that revises our current understanding of the interrelated development of genre conventions in print and television journalism based upon large-scale automated content analysis via machine learning;
  2. Metrics and guidelines for evaluating the bias and error of the different preprocessing and machine learning approaches and of-the-shelf software packages;
  3. A dashboard that integrates, compares and visualises different algorithms and underlying machine learning approaches which can be integrated in the CLARIAH media suite.

Partners

University of Groningen
National Library of The Netherlands
CWI
Triangles 2

Let’s collaborate

Are you a researcher who could benefit from our eScience skills and experience? Reach out to us today and let’s explore how we can work together.

Collaborate
Updates

Stay abreast of our latest news, events and funding opportunities

  • Dit veld is bedoeld voor validatiedoeleinden en moet niet worden gewijzigd.