Gabriele Sarti

University of Groningen
Fellowship project title: Inseq: an Interpretability Toolkit for Sequence Generation Models
The goal: Improving and promoting Inseq, a novel Python library to democratize access to explainable AI methods for the study of generative language models.
The why: Despite major advances in AI-powered text generation models, their complexity and opaqueness hinder our ability to study these systems and align them to ethical and safety constraints. Improving tools for explaining the predictions of these systems would help users to uncover implicit biases learned during the training process, and ultimately improve the safety and usability of language models in real-world settings.