Expertise: Data Visualization and Analytics for the Humanities
Data Visualization and Analytics for the Humanities
There is currently an important trend in the humanities to conduct research in a quantitative fashion, using large collections of data from different sources to support claims which would otherwise be difficult and time consuming to verify. This data driven revolution requires humanities scholars to make use of computational techniques for integrating diverse kinds of data from multiple sources as well as for analyzing such large volumes data. NLeSC is actively involved in developing its expertise in this area. Examples of such techniques include (but are not limited to):
- semantic linking – this refers to identifying the names of places, people, events, etc. in a given text. This is not a trivial problem as language can be very ambiguous. One way of achieving this is by analyzing how often a particular combination of words refers to a specific concept.
- sentiment analysis – this refers to assessing the emotional content of the text in question: reviews, personal opinion, tweeter messages, etc. may convey the opinion of the author (positive or negative) regarding a specific topic. Sentiment analysis is concerned with identifying such information from the way a text is written. This is achieved by identifying words in the text which are commonly associated with a particular emotion.
- topic modeling - When analyzing a text document, it seems natural to say “this document is about X” or to make statements such as “these two documents are about the same topic”. Topic modeling allows for this type of statements to be automated. In this way, documents which touch on the same topic can be related to one another.
- diachronous data analysis – concepts are not necessarily fixed in time and as such it is important to be able to analyze how they change over time.
- visualization - The growth of digital datasets in the humanities poses challenges for visualization, especially where the data lineage includes uncertainty at every step. We can no longer rely on spreadsheets and simple, one-dimensional graphs to capture the full complexity of our subject matter. NLeSC is dedicated to using the latest techniques in visualization and data exploration to tackle this issue, with several running projects; visualizing uncertainty, topic models and document similarity.