A multistage retrieval system for health-related misinformation detection

Fernández Pichel, Marcos; Losada Carril, David Enrique; Pichel Campos, Juan Carlos

doi:10.1016/j.engappai.2022.105211

A multistage retrieval system for health-related misinformation detection

Files

Multistage_Retrieval_System__accepted.pdf (1.25 MB)

Identifiers

URI: https://hdl.handle.net/10347/44927

E-ISSN: 1873-6769

DOI: 10.1016/j.engappai.2022.105211

Publication date

2022-07-20

Authors

Fernández Pichel, Marcos

Losada Carril, David Enrique

Pichel Campos, Juan Carlos

Publisher

Elsevier

Metrics

Export

Abstract

Web search is widely used to find online medical advice. As such, health-related information access requires retrieval algorithms capable of promoting reliable documents and filtering out unreliable ones. To this end, different types of components, such as query-document matching features, passage relevance estimation and AI-based reliability estimators, need to be combined. In this paper, we propose an entire pipeline for misinformation detection, based on the fusion of multiple content-based features. We present experiments which study the influence of each pipeline stage for the target task. Our technological solution incorporates signals from technologies derived from diverse research fields, including search, deep learning for natural language processing, as well as advanced supervised and unsupervised learning. To combine evidence, different score fusion strategies are compared, including unsupervised rank fusion techniques and learning-to-rank methods. The reference framework for empirically validating our solution is the TREC Health Misinformation Track, which provides several challenging subtasks that foster research on the identification of reliable and correct information for health-related decision making tasks. More specifically, we address a total recall task, the goal of which is to identify all the documents conveying incorrect information for a specific set of topics, and an ad-hoc retrieval task, aiming to rank credible and correct information over incorrect information. All variants are evaluated with an assorted set of effectiveness metrics, which includes standard search measures, such as R-Precision, Average Precision or Normalised Discounted Cumulative Gain, and innovative metrics based on the compatibility between the ranked output and two reference rankings composed of helpful and harmful documents, respectively. Our experiments demonstrate the effectiveness of the proposed pipeline stages and indicate that sophisticated supervised fusion methods do not fare better than simpler fusion alternatives. Additionally, for reliability estimation, unsupervised textual similarity performs better than textual classification based on supervised learning. The results also show that the presented approach is highly competitive when compared with state-of-the-art solutions for the same problem.

Keywords

Bibliographic citation

Engineering Applications of Artificial Intelligence Volume 115, October 2022, 105211

Publisher version

https://doi.org/10.1016/j.engappai.2022.105211

Rights

Attribution-NonCommercial-NoDerivatives 4.0 International

Collections

Electrónica e Computación
Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)

Full item page

A multistage retrieval system for health-related misinformation detection

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Publisher version

Sponsors

Rights

Collections