Building a collection of gold standard medical text annotations
Exploring the potential of a crowdsourcing game with a purpose (GWAP)
Developing a scalable and sustainable framework for the continuous engagement of domain experts in the collection and analysis of medical and scientific gold standard data
Crowdsourcing, the process of soliciting contributions from a large group of people, and especially from an online community, to obtain content, services or ideas, is increasingly used to subdivide tedious work. Annotating medical texts is one example of such tedious work.
While cognitive computing systems such as IBM’s Watson can process data and text together to give data more meaning and enable deeper analysis, these cognitive computing systems require large amounts of human annotated data (gold standard data) for evaluation, testing and training. Collecting this annotated data is the most expensive and time consuming part of building cognitive computing systems. The Dr. Watson project aims to motivate a community of medical professionals to build a collection of gold standard medical text annotations. It is the first larger scale pilot study in an innovative interdisciplinary research project using crowdsourcing, motivation and gaming techniques to engage medical professionals in the collection of valuable gold standard annotation data.
The project will explore from an inter-disciplinary perspective the potential of a crowdsourcing game with a purpose (GWAP). The scientific focus of the work is to identify, model and test specific motivational and personalization strategies (suitable for medical professionals) that can be integrated in a game setting.
What adaptation and game elements can be used to motivate a crowd of medical experts to validate annotations of a lay crowd, and to identify new text annotations, for example with respect to medical factor and relation extraction? Can we adapt the existing data quality metrics (as defined on the lay crowd results) in order to also harness the disagreement between medical experts? And what are appropriate visualizations of crowdsourced medical text annotation data to support its processing by medical experts and crowdsourcing researchers?
Motivating a community of medical professionals to contribute to the collection and analysis of medical and scientific gold standard data
The Dr. Watson project is a part of the broader Crowd-Watson project, which is developing a scalable and sustainable framework for the continuous engagement of domain experts in the collection and analysis of medical and scientific gold standard data. This data is used for evaluating and training automatic data analysis tools. Ultimately we seek to create a symbiotic human-machine pipeline for disagreement-centric text annotation, which combines machine processing of large quantities of textual data with the power of the crowd for annotation of medical and scientific texts.
The concrete result will be continuous and up-to-date training and evaluation data for machine learning and information extraction, and in this way we will be able to support scientific research. For the Dr. Watson project, the focus will be on tasks that require medical expertise and where games with a purpose have proven to be effective in (motivating) medical experts and experts from other scientific domains. The project will study the conditions, incentives and the data medical experts provide, in terms of time, cost and quality to generate rich training data for cognitive computing systems. The two problems that will specifically be studied, factor identification and relation extraction in medical texts, are critically important problems for medical research, as they are the foundation of all text analysis tools.
The main outcome of the project will be open source tools and scientific data to be used by others. An open source framework will be designed to generate crowdsourcing medical text annotation games by integrating existing crowdsourcing components developed in the Crowd-Watson project; data quality metrics will be developed to evaluate the adaptation, motivation, scoring and gaming strategies; and a disagreement-centric analysis of the annotation data collected through the game will be performed and compared with the previously collected data from the lay crowd (and also matched to linked data).
Image: Crowd-Watson team