Semi-Supervised Learning in the Field of Conversational Agents and Motivational Interviewing

Loading...
Thumbnail Image
Identifiers

Publication date

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Sociedad Española para el Procesamiento del Lenguaje Natural
Metrics
Google Scholar
lacobus
Export

Research Projects

Organizational Units

Journal Issue

Abstract

The exploitation of Motivational Interviewing concepts for text analysis contributes to gaining valuable insights into individuals’ perspectives and attitudes towards behaviour change. The scarcity of labelled user data poses a persistent challenge and impedes technical advances in research under non-English language scenarios. To address the limitations of manual data labelling, we propose a semi-supervised learning method as a means to augment an existing training corpus. Our approach leverages machine-translated user-generated data sourced from social media communities and employs self-training techniques for annotation. To that end, we consider various source contexts and conduct an evaluation of multiple classifiers trained on various augmented datasets. The results indicate that this weak labelling approach does not yield improvements in the overall classification capabilities of the models. However, notable enhancements were observed for the minority classes. We conclude that several factors, including the quality of machine translation, can potentially bias the pseudo-labelling models and that the imbalanced nature of the data and the impact of a strict pre-filtering threshold need to be taken into account as inhibiting factors.

Description

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Sponsors

This work was supported by project PLEC2021- 007662 (MCIN/AEI/10.13039/501100011033, Plan de Recuperación, Transformación y Resiliencia, Next Generation EU). The authors also thank the financial support supplied by the Xunta de Galicia-Consellería de Cultura, Educación, Formación Profesional e Universidade (ED431G 2023/04, ED431C 2022/19) and the ERDF, which acknowledges the CiTIUS- Research Center in Intelligent Technologies of the USC as a Research Center of the Galician University System. David E. Losada thanks the financial support obtained from project SUBV23/00002 (Ministerio de Consumo, Subdirección General de Regulación del Juego) and project PID2022-137061OB-C22 (Ministerio de Ciencia e Innovación, AEI, Proyectos de Generación de Conocimiento; supported by the ERDF).

Rights

Attribution 4.0 International