A multistage retrieval system for health-related misinformation detection

Fernández Pichel, Marcos; Losada Carril, David Enrique; Pichel Campos, Juan Carlos

doi:10.1016/j.engappai.2022.105211

A multistage retrieval system for health-related misinformation detection

dc.contributor.affiliation	Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
dc.contributor.author	Fernández Pichel, Marcos
dc.contributor.author	Losada Carril, David Enrique
dc.contributor.author	Pichel Campos, Juan Carlos
dc.date.accessioned	2026-01-08T09:46:46Z
dc.date.available	2026-01-08T09:46:46Z
dc.date.issued	2022-07-20
dc.description.abstract	Web search is widely used to find online medical advice. As such, health-related information access requires retrieval algorithms capable of promoting reliable documents and filtering out unreliable ones. To this end, different types of components, such as query-document matching features, passage relevance estimation and AI-based reliability estimators, need to be combined. In this paper, we propose an entire pipeline for misinformation detection, based on the fusion of multiple content-based features. We present experiments which study the influence of each pipeline stage for the target task. Our technological solution incorporates signals from technologies derived from diverse research fields, including search, deep learning for natural language processing, as well as advanced supervised and unsupervised learning. To combine evidence, different score fusion strategies are compared, including unsupervised rank fusion techniques and learning-to-rank methods. The reference framework for empirically validating our solution is the TREC Health Misinformation Track, which provides several challenging subtasks that foster research on the identification of reliable and correct information for health-related decision making tasks. More specifically, we address a total recall task, the goal of which is to identify all the documents conveying incorrect information for a specific set of topics, and an ad-hoc retrieval task, aiming to rank credible and correct information over incorrect information. All variants are evaluated with an assorted set of effectiveness metrics, which includes standard search measures, such as R-Precision, Average Precision or Normalised Discounted Cumulative Gain, and innovative metrics based on the compatibility between the ranked output and two reference rankings composed of helpful and harmful documents, respectively. Our experiments demonstrate the effectiveness of the proposed pipeline stages and indicate that sophisticated supervised fusion methods do not fare better than simpler fusion alternatives. Additionally, for reliability estimation, unsupervised textual similarity performs better than textual classification based on supervised learning. The results also show that the presented approach is highly competitive when compared with state-of-the-art solutions for the same problem.
dc.description.peerreviewed	SI
dc.description.sponsorship	The authors thank the support obtained from: (i) project RTI2018-093336-B-C21 (Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación & ERDF), (ii) project PLEC2021-007662 (MCIN/AEI/10.13039/501100011033, Ministerio de Ciencia e Innovación, Agencia Estatal de Investigación, Plan de Recuperación, Transformación Resiliencia, Unión Europea-Next GenerationEU), and (iii) Consellería de Educación, Universidade e Formación Profesional (accreditation 2019–2022 ED431G-2019/04, ED431C 2018/29) and the European Regional Development Fund , which acknowledges the CiTIUS-Research Center in Intelligent Technologies of the University of Santiago de Compostela as a Research Center of the Galician University System.
dc.identifier.citation	Engineering Applications of Artificial Intelligence Volume 115, October 2022, 105211
dc.identifier.doi	10.1016/j.engappai.2022.105211
dc.identifier.essn	1873-6769
dc.identifier.uri	https://hdl.handle.net/10347/44927
dc.journal.title	Engineering Applications of Artificial Intelligence
dc.language.iso	eng
dc.publisher	Elsevier
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-093336-B-C21/ES/TECNOLOGIAS PARA LA PREDICCION TEMPRANA DE SIGNOS RELACIONADOS CON TRASTORNOS PSICOLOGICOS
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PLEC2021-007662/ES/Big-eRisk: Predicción temprana de riesgos personales en conjuntos de datos masivos
dc.relation.publisherversion	https://doi.org/10.1016/j.engappai.2022.105211
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 International	en
dc.rights.accessRights	open access
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Engineering applications
dc.subject	Web search
dc.subject	Health misinformation
dc.subject	Information retrieval
dc.subject	Natural language processing
dc.subject	Artificial intelligence
dc.subject	Deep learning for natural language processing
dc.title	A multistage retrieval system for health-related misinformation detection
dc.type	journal article
dc.type.hasVersion	AM
dspace.entity.type	Publication
relation.isAuthorOfPublication	ad1c87f4-64b2-44aa-ab80-4709cef31dfe
relation.isAuthorOfPublication	7ddb36fe-bf39-4c79-85bc-540ce4d9a23b
relation.isAuthorOfPublication	db334853-753e-4afc-9f4f-ad847d0353a7
relation.isAuthorOfPublication.latestForDiscovery	ad1c87f4-64b2-44aa-ab80-4709cef31dfe

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Multistage_Retrieval_System__accepted.pdf
Size:: 1.25 MB
Format:: Adobe Portable Document Format

Download

Collections

Electrónica e Computación
Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)