LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction
| dc.contributor.affiliation | Universidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS) | |
| dc.contributor.author | Gamallo Otero, Pablo | |
| dc.contributor.author | García González, Marcos | |
| dc.contributor.author | Piñeiro Pomar, César Alfredo | |
| dc.contributor.author | Martínez-Castaño, Rodrigo | |
| dc.contributor.author | Pichel Campos, Juan Carlos | |
| dc.date.accessioned | 2025-01-22T13:12:36Z | |
| dc.date.available | 2025-01-22T13:12:36Z | |
| dc.date.issued | 2018-12-02 | |
| dc.description.abstract | This paper presents LinguaKit, a multilingual suite of tools for analysis, extraction, annotation and linguistic correction, as well as its integration into a Big Data infrastructure. LinguaKit allows the user to perform different tasks such as PoS-tagging, syntactic parsing, coreference resolution (among others), including applications for relation extraction, sentiment analysis, summarization, extraction of multiword expressions, or entity linking to DBpedia. Most modules work in four languages: Portuguese, Spanish, English, and Galician. The system is programmed in Perl and is freely available under a GPLv3 license. | |
| dc.description.peerreviewed | SI | |
| dc.description.sponsorship | This work has been supported by MINECO (TIN2014-54565-JIN, FFI2014- 51978-C2-1-R), MICINN (IJCI-2016-29598), Xunta de Galicia (ED431G/08), European Regional Development Fund (ERDF), and by two BBVA Foundation Grants for Researchers and Cultural Creators (2016 and 2017). | |
| dc.identifier.citation | P. Gamallo, M. Garcia, C. Piñeiro, R. Martinez-Castaño and J. C. Pichel, "LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction," 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, Spain, 2018, pp. 239-244, doi: 10.1109/SNAMS.2018.8554689. | |
| dc.identifier.doi | 10.1109/SNAMS.2018.8554689 | |
| dc.identifier.uri | https://hdl.handle.net/10347/38902 | |
| dc.issue.number | 2018 | |
| dc.journal.title | 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS) | |
| dc.language.iso | eng | |
| dc.page.final | 244 | |
| dc.page.initial | 239 | |
| dc.publisher | IEEE | |
| dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//TIN2014-54565-JIN/ES/APROXIMANDO LA COMPUTACION DE ALTAS PRESTACIONES A LAS TECNOLOGIAS BIG DATA: APLICACION AL PROCESAMIENTO DEL LENGUAJE NATURAL/ | |
| dc.relation.projectID | info:eu-repo/grantAgreement/MINECO//FFI2014-51978-C2-1-R/ES/TECNOLOGIAS DE LA LENGUA PARA ANALISIS DE OPINIONES EN REDES SOCIALES/ | |
| dc.rights | © 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. | |
| dc.rights.accessRights | open access | |
| dc.subject | Bilingual | |
| dc.subject | Information Extraction | |
| dc.subject | Big Data | |
| dc.subject | Sentiment Analysis | |
| dc.subject | Postage | |
| dc.subject | Relation Extraction | |
| dc.subject | Syntactic Analysis | |
| dc.subject | Multi-word | |
| dc.subject | Basis Of Analysis | |
| dc.subject | Fault-tolerant | |
| dc.subject | Analysis Module | |
| dc.subject | Disambiguation | |
| dc.subject | State Machine | |
| dc.subject | Tokenized | |
| dc.subject | Related Entities | |
| dc.subject | Input Text | |
| dc.subject | List Of Pairs | |
| dc.subject | Basic Module | |
| dc.subject | Big Data Technology | |
| dc.subject | Proper Nouns | |
| dc.subject | Phonetic Transcription | |
| dc.subject | Keyword Extraction | |
| dc.subject | Semantic Annotation | |
| dc.subject | Lemmatization | |
| dc.subject | Apache Spark | |
| dc.subject | Language Identification | |
| dc.title | LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction | |
| dc.type | journal article | |
| dc.type.hasVersion | AM | |
| dspace.entity.type | Publication | |
| relation.isAuthorOfPublication | 898ee1bb-f9e8-4a75-9858-a6c9142bc99e | |
| relation.isAuthorOfPublication | ae090fc6-2387-4087-ba21-7271835b4b35 | |
| relation.isAuthorOfPublication | 665c60c6-1b37-4499-8c35-aa52bd7ffcf5 | |
| relation.isAuthorOfPublication | db334853-753e-4afc-9f4f-ad847d0353a7 | |
| relation.isAuthorOfPublication.latestForDiscovery | 665c60c6-1b37-4499-8c35-aa52bd7ffcf5 |
Files
Original bundle
1 - 1 of 1