TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
dc.contributor.authorMariotti, Ettore
dc.contributor.authorArias-Duart, Anna
dc.contributor.authorCafagna, Michele
dc.contributor.authorGatt, Albert
dc.contributor.authorGarcía-Gasulla, Darío
dc.contributor.authorAlonso Moral, José María
dc.date.accessioned2025-05-05T07:50:47Z
dc.date.available2025-05-05T07:50:47Z
dc.date.issued2024-05-31
dc.description.abstractAmong the existing eXplainable AI (XAI) approaches, Feature Attribution methods are a popular option due to their interpretable nature. However, each method leads to a different solution, thus introducing uncertainty regarding their reliability and coherence with respect to the underlying model. This work introduces TextFocus, a metric for evaluating the faithfulness of Feature Attribution methods for Natural Language Processing (NLP) tasks involving classification. To address the absence of ground truth explanations for such methods, we introduce the concept of textual mosaics. A mosaic is composed of a combination of sentences belonging to different classes, which provides an implicit ground truth for attribution. The accuracy of explanations can be then evaluated by comparing feature attribution scores with the known class labels in the mosaic. The performance of six feature attribution methods is systematically compared on three sentence classification tasks by using TextFocus, with Integrated Gradients being the best overall method in terms of faithfulness and computational requirements. The proposed methodology fills a gap in NLP evaluation, by providing an objective way to assess Feature Attribution methods while finding their optimal parameters.
dc.description.peerreviewedSI
dc.description.sponsorshipThis work was supported in part by NL4XAI Project funded by European Union’s Horizon 2020 Research and Innovation Program under the Marie Skłodowska-Curie Grant under Agreement 860621; in part by MCIN/AEI/10.13039/501100011033 and 11ESF Investing in Your Future under Grant PID2021-123152OB-C21; in part by MCIN/AEI/10.13039/501100011033 and the 11European Union NextGenerationEU/PRTR under Grant TED2021-130295B-C33; in part by the Galician Ministry of Culture, Education, Professional Training, and University (co-funded by European Regional Development Fund, ERDF/FEDER Program) under Grant ED431G2019/04 and Grant ED431C2022/19; and in part by European Union–Horizon 2020 Program under the Scheme 11INFRAIA-01-2018-2019– Integrating Activities for Advanced Communities 11SoBigData++: European Integrated Infrastructure for Social Mining and Big Data Analytics (http://www.sobigdata.eu) under Grant 871042.
dc.identifier.citationE. Mariotti, A. Arias-Duart, M. Cafagna, A. Gatt, D. Garcia-Gasulla and J. M. Alonso-Moral. (2024). TextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing. "IEEE Access", vol. 12, pp. 138870-138880
dc.identifier.doi10.1109/ACCESS.2024.3408062
dc.identifier.issn2169-3536
dc.identifier.urihttps://hdl.handle.net/10347/41177
dc.journal.titleIEEE Access
dc.language.isoeng
dc.page.final138880
dc.page.initial138870
dc.publisherIEEE
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica, Técnica y de Innovación 2021-2023/PID2021-123152OB-C21
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/H2020/871042/EU
dc.relation.projectIDinfo:eu-repo/grantAgreement/EC/H2020/871042/EU
dc.relation.publisherversionhttps://doi.org/10.1109/ACCESS.2024.3408062
dc.rights© 2024 The Authors. This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Attribution 4.0 International
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectNatural language processing
dc.subjectPredictive models
dc.subjectMeasurement
dc.subjectExplainable AI
dc.subjectData models
dc.subjectComputational modeling
dc.subjectArtificial intelligence
dc.subjectFeature detection
dc.subjectModelos predictivos
dc.subjectModelos preditivos
dc.subjectModelos de datos
dc.subjectIntelixencia Artificial
dc.subjectInteligencia Artificial
dc.subjectTrustworthy AI
dc.subjectExplanation faithfulness
dc.subject.classification120304 Inteligencia artificial
dc.titleTextFocus: Assessing the Faithfulness of Feature Attribution Methods Explanations in Natural Language Processing
dc.typejournal article
dc.type.hasVersionVoR
dc.volume.number12
dspace.entity.typePublication
relation.isAuthorOfPublication47f74ee4-a6d5-49cd-8a38-bf9fdeef8f69
relation.isAuthorOfPublication.latestForDiscovery47f74ee4-a6d5-49cd-8a38-bf9fdeef8f69

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2025_access_alonso_textfocus.pdf
Size:
1.1 MB
Format:
Adobe Portable Document Format