I was especially glad that they started by asking me for an historical question that needed answering.
New horizons are opening up for the humanities, now that vast collections of newspapers and other texts are being digitized. History professor Huub Wijfjes worked with the eScience Center to test a widely held vision of the past.
It’s one of the most firmly established concepts in Dutch history: for most of the 20th century, protestants, catholics and socialists are supposed to have lived in separate social ‘pillars’, each with their own political parties, newspapers, sports clubs and so forth. However, recently some doubt has been cast on this picture, especially regarding the newspapers: were they really so loyal to the political leadership of their own pillar?
Such questions are traditionally answered by qualitative research: close reading of carefully selected source material. But now something has changed: the Dutch Royal Library is digitizing its vast collection of newspapers, ranging from 1618 to today. For the first time, it’s become possible to see the forest instead of the trees.
For Huub Wijfjes, history professor at the Universities of Groningen and Amsterdam, this is a fundamental challenge: “It’s humanly impossible to leaf through fifty years of so many newspapers, page by page, and record everything you find about politics. Now computers can scan these papers for us. But how can we get them to provide the answers we seek?”
It’s humanly impossible to leaf through fifty years of so many newspapers, page by page, and record everything you find about politics
Wijfjes got help from the Royal Library, the Documentation Centre of Dutch Political Parties in Groningen, NIAS and especially the Netherlands eScience Center. Together with an eScience Research Engineer - and a budget - he was able to analyze metadata of the newspaper collection for names of parties, party leaders and ‘pillarized' organizations such as trade unions. They were even able to identify the most characteristic political concepts, by first deriving word clouds from political programs and then mapping these on the full Proceedings of Parliament to see how prominent certain concepts really were for specific parties. The resulting lists of concepts and the fore-mentioned names were used to compare newspapers from different pillars, plus neutral newspapers, for the period of 1918 till 1967.
“It was quite exciting when our eScience Research Engineer produced the first graphs,” says Wijfjes. “By and large, the results confirm the traditional picture, that pillarized newspapers focused on the people and policies of their own parties. But we’ve found some interesting aspects that we’re still studying.”
It was quite exciting when our eScience Research Engineer produced the first graphs
The project sounds much easier than it actually was. Some politicians had rather common names, so how do you identify them among all the other content of the newspapers? And how do you deal with expressions like ‘our party’? Add to that lots of gaps and errors in the data and the picture gets rather murky.
Wijfjes: “Our eScience Research Engineer, Patrick Bos, was wonderful. At first we wondered what an astronomer could do for us, but he quickly grasped all the methodological details and worked out solutions. Data handling, segmentation and merging, developing an algorithm that didn’t distort the results – as a historian, I could never hope to master this stuff.”
The results of this cooperation go much further than the project. Wijfjes: “All our research material is now available in Open Access. Not only the data set, but also the algorithm. So far, we’ve only looked at general, long term trends, but it’s quite possible to zoom in on specific time periods or persons. People may want to look at episodes like the Mandement of 1954, when the bishops expressly forbade catholics to listen to the socialist broadcaster of vote for socialist politicians. Did it really make a difference in the catholic newspapers?”
The project was based on the metadata of 8 million pages. So far, the Royal Library has digitized only 15 percent of its newspaper collection, so there’s a great scope for future research. However, Wijfjes warns that digitized material should be handled with care. “We discovered many errors. As much as 5 to 10 percent of the metadata may be wrong because of faulty optical character recognition. We also noticed big gaps in the collection: for some newspapers entire years or even decades are missing. So our traditional ‘source criticism’ is still vital.”
If you know what you’re doing, the possibilities are incredible
But on the whole he’s enthusiastic. “If you know what you’re doing, the possibilities are incredible. There’s also interest from the Institute of Sound and Image, the national radio and tv archive: they are eager for this kind of research.”
This project brought together a wide range of disciplines: not only history and eScience, but also political and media science, says Wijfjes. The results are relevant for even more disciplines. “We’ve developed a methodology and tools to analyze and understand this sort of text databases. Apart from the material that we put in Open Access, we also published two studies about the methods we used.”
He’s “really happy’ with the assistance he got from the eScience Center. “Apart from their general helpfulness and expertise, I was especially glad that they started by asking me for an historical question that needed answering. Far too often, digital humanities focus on purely technological challenges, instead of answering historical questions in a state of the art digital environment.”
I was especially glad that they started by asking me for an historical question that needed answering