TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing

Mariotti, Ettore; Arias-Duart, Anna; Cafagna, Michele; Gatt, Albert; García-Gasulla, Darío; Alonso Moral, José María

doi:10.1109/ACCESS.2024.3408062

TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing

Files

2025_access_alonso_textfocus.pdf (1.1 MB)

Identifiers

URI: https://hdl.handle.net/10347/41177

ISSN: 2169-3536

DOI: 10.1109/ACCESS.2024.3408062

Publication date

2024-05-31

Authors

García-Gasulla, Darío

Alonso Moral, José María

Publisher

IEEE

Metrics

Export

Abstract

Among the existing eXplainable AI (XAI) approaches, Feature Attribution methods are a popular option due to their interpretable nature. However, each method leads to a different solution, thus introducing uncertainty regarding their reliability and coherence with respect to the underlying model. This work introduces TextFocus, a metric for evaluating the faithfulness of Feature Attribution methods for Natural Language Processing (NLP) tasks involving classification. To address the absence of ground truth explanations for such methods, we introduce the concept of textual mosaics. A mosaic is composed of a combination of sentences belonging to different classes, which provides an implicit ground truth for attribution. The accuracy of explanations can be then evaluated by comparing feature attribution scores with the known class labels in the mosaic. The performance of six feature attribution methods is systematically compared on three sentence classification tasks by using TextFocus, with Integrated Gradients being the best overall method in terms of faithfulness and computational requirements. The proposed methodology fills a gap in NLP evaluation, by providing an objective way to assess Feature Attribution methods while finding their optimal parameters.

Keywords

Bibliographic citation

E. Mariotti, A. Arias-Duart, M. Cafagna, A. Gatt, D. Garcia-Gasulla and J. M. Alonso-Moral. (2024). TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing. "IEEE Access", vol. 12, pp. 138870-138880

Publisher version

https://doi.org/10.1109/ACCESS.2024.3408062

Sponsors

This work was supported in part by NL4XAI Project funded by European Union’s Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie Grant under Agreement 860621; in part by MCIN/AEI/10.13039/501100011033 and 11ESF Investing in Your Future under Grant PID2021-123152OB-C21; in part by MCIN/AEI/10.13039/501100011033 and the 11European Union NextGenerationEU/PRTR under Grant TED2021-130295B-C33; in part by the Galician Ministry of Culture, Education, Professional Training, and University (co-funded by European Regional Development Fund, ERDF/FEDER Program) under Grant ED431G2019/04 and Grant ED431C2022/19; and in part by European Union–Horizon 2020 Program under the Scheme 11INFRAIA-01-2018-2019– Integrating Activities for Advanced Communities 11SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics (http://www.sobigdata.eu) under Grant 871042.

Rights

Collections

Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
Electrónica e Computación

Full item page

TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Publisher version

Sponsors

Rights

Collections