Real-time focused extraction of social media users

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Informacióngl
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Electrónica e Computacióngl
dc.contributor.authorMartínez Castaño, Rodrigo
dc.contributor.authorLosada Carril, David Enrique
dc.contributor.authorPichel Campos, Juan Carlos
dc.date.accessioned2023-02-21T13:10:52Z
dc.date.available2023-02-21T13:10:52Z
dc.date.issued2022
dc.description.abstractIn this paper, we explore a real-time automation challenge: the problem of focused extraction of Social Media users. This challenge can be seen as a special form of focused crawling where the main target is to detect users with certain patterns. Given a specific user profile, the task consists of rapidly ingesting Social Media data and early detecting target users. This is a real-time intelligent automation task that has numerous applications in domains such as safety, health or marketing. The volume and dynamics of Social Media contents demand efficient real-time solutions able to predict which users are worth to explore. To meet this aim, we propose and evaluate several methods that effectively allow us to harvest relevant users. Even with little contextual information (e.g., a single user submission), our methods quickly focus on the most promising users. We also developed a distributed microservice architecture that supports real-time parallel extraction of Social Media users. This modular architecture scales up in clusters of computers and it can be easily adapted for user extraction in multiple domains and Social Media sources. Our experiments suggest that some of the proposed prioritisation methods, which work with minimal user context, are effective at rapidly focusing on the most relevant users. These methods perform satisfactorily with huge volumes of users and interactions and lead to harvest ratios 2 to 9 times higher than those achieved by random prioritisationgl
dc.description.peerreviewedSIgl
dc.description.sponsorshipThis work was supported in part by the Ministerio de Ciencia e Innovación (MICINN) under Grant RTI2018-093336-B-C21 and Grant PLEC2021-007662; in part by Xunta de Galicia under Grant ED431G/08, Grant ED431G-2019/04, Grant ED431C 2018/19, and Grant ED431F 2020/08; and in part by the European Regional Development Fund (ERDF)gl
dc.identifier.citationR. Martínez-Castaño, D. E. Losada and J. C. Pichel, "Real-Time Focused Extraction of Social Media Users," in IEEE Access, vol. 10, pp. 42607-42622, 2022, doi: 10.1109/ACCESS.2022.3168977gl
dc.identifier.doi10.1109/ACCESS.2022.3168977
dc.identifier.issn2169-3536
dc.identifier.urihttp://hdl.handle.net/10347/30186
dc.language.isoenggl
dc.publisherIEEEgl
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-093336-B-C21/ES/TECNOLOGIAS PARA LA PREDICCION TEMPRANA DE SIGNOS RELACIONADOS CON TRASTORNOS PSICOLOGICOSgl
dc.relation.publisherversionhttps://doi.org/10.1109/ACCESS.2022.3168977gl
dc.rightsThis work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/gl
dc.rightsAtribución 4.0 Internacional
dc.rights.accessRightsopen accessgl
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectBig datagl
dc.subjectDistributed systemsgl
dc.subjectFocused user extractiongl
dc.subjectSupervised learninggl
dc.subjectInformation retrievalgl
dc.subjectReal-time processinggl
dc.subjectSocial mediagl
dc.titleReal-time focused extraction of social media usersgl
dc.typejournal articlegl
dc.type.hasVersionVoRgl
dspace.entity.typePublication
relation.isAuthorOfPublication7ddb36fe-bf39-4c79-85bc-540ce4d9a23b
relation.isAuthorOfPublicationdb334853-753e-4afc-9f4f-ad847d0353a7
relation.isAuthorOfPublication.latestForDiscovery7ddb36fe-bf39-4c79-85bc-540ce4d9a23b

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2022_ieee_martinez_realtime.pdf
Size:
2.51 MB
Format:
Adobe Portable Document Format
Description:
Artigo