Semi-Supervised Learning in the Field of Conversational Agents and Motivational Interviewing
Loading...
Identifiers
Publication date
Advisors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Sociedad Española para el Procesamiento del Lenguaje Natural
Abstract
The exploitation of Motivational Interviewing concepts for text analysis contributes to gaining valuable insights into individuals’ perspectives and attitudes towards behaviour change. The scarcity of labelled user data poses a persistent challenge and impedes technical advances in research under non-English language scenarios. To address the limitations of manual data labelling, we propose a semi-supervised learning method as a means to augment an existing training corpus. Our approach leverages machine-translated user-generated data sourced from social media communities and employs self-training techniques for annotation. To that end, we consider various source contexts and conduct an evaluation of multiple classifiers trained on various augmented datasets. The results indicate that this weak labelling approach does not yield improvements in the overall classification capabilities of the models. However, notable enhancements were observed for the minority classes. We conclude that several factors, including the quality of machine translation, can potentially bias the pseudo-labelling models and that the imbalanced nature of the data and the impact of a strict pre-filtering threshold need to be taken into account as inhibiting factors.
Description
Bibliographic citation
Relation
Has part
Has version
Is based on
Is part of
Is referenced by
Is version of
Requires
Publisher version
https://doi.org/10.26342/2024-73-4Sponsors
This work was supported by project PLEC2021- 007662 (MCIN/AEI/10.13039/501100011033, Plan de Recuperación, Transformación y Resiliencia, Next Generation EU). The authors also thank the financial support supplied by the Xunta de Galicia-Consellería de Cultura, Educación, Formación Profesional e Universidade (ED431G 2023/04, ED431C 2022/19) and the ERDF, which acknowledges the CiTIUS- Research Center in Intelligent Technologies of the USC as a Research Center of the Galician University System. David E. Losada thanks the financial support obtained from project SUBV23/00002 (Ministerio de Consumo, Subdirección General de Regulación del Juego) and project PID2022-137061OB-C22 (Ministerio de Ciencia e Innovación, AEI, Proyectos de Generación de Conocimiento; supported by the ERDF).
Rights
Attribution 4.0 International








