Software: xtas



xtas is a natural language processing (NLP) toolkit with built-in task distribution. It ties together a range of language processing and text mining packages that solve problems such as text clustering, sentiment detection and named-entity recognition, giving them a single interface and performing all of these tasks "in the cloud" (or not, if local execution is desired). It can also talk to the Elasticsearch search engine package, so that the results of text processing can be stored to and queried to enable semantic search in document collections.

xtas is used in the Searching Public Discourse (SPuDisc) project, where it is used to enrich large collections of historical newspaper data with automatic annotations. These are used to search for concepts in addition to keywords.

