Developing a centralized service for gathering, storing, and analyzing Twitter messages
There is a growing interest from companies, governments and universities in the daily communication that takes place on online social media such as blogs, Facebook, and Twitter. Linguists and researchers in communication studies can use this data to study language variation and change. Companies may track reputation of a product after its introduction. Journalists may follow the spread of news messages and spot initial local reports of incidents. Police may monitor Twitter for suspicious behavior. However, the amount of social media data is large and obtaining specific parts that are interesting for a certain purpose, is not easy.
This project has developed a centralized service for gathering, storing, and analyzing Twitter messages and making available derived information to a consortium of researchers in communication studies and language technology throughout the Netherlands.
The service is based on an existing system set up at the ISLA (UvA) and the RUG with infrastructure from SURFsara – mapping these tweets is a very compute intensive activity. The Twitter API, providing free access to approximately 1% of all tweets worldwide, is constantly harvested and the resulting data stored. Interfaces to this data provide users with a number of analysis tools that can be run on all content and metadata.
Image: Michele Ursino (CC License)