Kernel machine learning methods to handle missing responses with complex predictors: application in modelling five-year glucose changes using distributional representations

Matabuena Rodríguez, Marcos; Félix Lamas, Paulo; García Meixide, Carlos; Gude Sampedro, Francisco

doi:10.1016/j.cmpb.2022.106905

Kernel machine learning methods to handle missing responses with complex predictors: application in modelling five-year glucose changes using distributional representations

dc.contributor.affiliation	Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Información	gl
dc.contributor.author	Matabuena Rodríguez, Marcos
dc.contributor.author	Félix Lamas, Paulo
dc.contributor.author	García Meixide, Carlos
dc.contributor.author	Gude Sampedro, Francisco
dc.date.accessioned	2022-08-19T07:15:25Z
dc.date.available	2022-08-19T07:15:25Z
dc.date.issued	2022
dc.description.abstract	Background and objectives: Missing data is a ubiquitous problem in longitudinal studies due to the number of patients lost to follow-up. Kernel methods have enriched the machine learning field by successfully managing non-vectorial predictors, such as graphs, strings, and probability distributions, and have emerged as a promising tool for the analysis of complex data stemming from modern healthcare. This paper proposes a new set of kernel methods to handle missing data in the response variables. These methods will be applied to predict long-term changes in glycated haemoglobin (A1c), the primary biomarker used to diagnose and monitor the progression of diabetes mellitus, making emphasis on exploring the predictive potential of continuous glucose monitoring (CGM). Methods: We propose a new framework of non-linear kernel methods for testing statistical independence, selecting relevant predictors, and quantifying the uncertainty of the resultant predictive models. As a novelty in the clinical analysis, we used a distributional representation of CGM as a predictor and compared its performance with that of traditional diabetes biomarkers. Results: The results show that, after the incorporation of CGM information, predictive ability increases from to . In addition, uncertainty analysis is useful for characterising some subpopulations where predictivity is worsened, and a more personalised clinical follow-up is advisable according to expected patient uncertainty in glucose values. Conclusions: The proposed methods have proven to deal effectively with missing data. They also have the potential to improve the results of predictive tasks by including new complex objects as explanatory variables and modelling arbitrary dependence relations. The application of these methods to a longitudinal study of diabetes showed that the inclusion of a distributional representation of CGM data provides greater sensitivity in predicting five-year A1c changes than classical diabetes biomarkers and traditional CGM metrics	gl
dc.description.peerreviewed	SI	gl
dc.description.sponsorship	This study was supported by ISCIII (PI20/01069, RD21/0016/0022; Cofinanciado por la Unión Europea/FEDER, ”A way to make Europe”); and the Ministry of Science, Innovation and Universities of Spain (RTI2018-099646-B-I00)	gl
dc.identifier.citation	Computer Methods and Programs in Biomedicine 221(2022) 106905	gl
dc.identifier.doi	10.1016/j.cmpb.2022.106905
dc.identifier.essn	0169-2607
dc.identifier.uri	http://hdl.handle.net/10347/29089
dc.language.iso	eng	gl
dc.publisher	Elsevier	gl
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-099646-B-I00/ES/MODELOS, TECNICAS Y METODOLOGIAS BASADAS EN LA INTELIGENCIA ARTIFICIAL PARA LA MEJORA DE LA ADHERENCIA TERAPEUTICA	gl
dc.relation.publisherversion	https://doi.org/10.1016/j.cmpb.2022.106905	gl
dc.rights	© 2022 The Author(s). Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/)	gl
dc.rights	Atribución 4.0 Internacional
dc.rights.accessRights	open access	gl
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/
dc.subject	Missing data	gl
dc.subject	Kernel methods	gl
dc.subject	Statistical independence	gl
dc.subject	Variable selection	gl
dc.subject	Regression modelling	gl
dc.subject	Diabetes mellitus	gl
dc.subject	Continuous glucose monitoring	gl
dc.title	Kernel machine learning methods to handle missing responses with complex predictors: application in modelling five-year glucose changes using distributional representations	gl
dc.type	journal article	gl
dc.type.hasVersion	VoR	gl
dspace.entity.type	Publication
relation.isAuthorOfPublication	53f67cf4-0e5a-420e-add7-e6c457accd15
relation.isAuthorOfPublication	61ef7bd7-5fc0-4694-82ef-d102c16b2204
relation.isAuthorOfPublication.latestForDiscovery	53f67cf4-0e5a-420e-add7-e6c457accd15

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2022_cmapib_matabuena_kernel.pdf
Size:: 1.62 MB
Format:: Adobe Portable Document Format
Description:: Artigo de investigación

Download

Collections

Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
Electrónica e Computación