LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction

Gamallo Otero, Pablo; García González, Marcos; Piñeiro Pomar, César Alfredo; Martínez-Castaño, Rodrigo; Pichel Campos, Juan Carlos

doi:10.1109/SNAMS.2018.8554689

LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction

Files

GamGarPinMarPic2018a.pdf (529.76 KB)

Identifiers

URI: https://hdl.handle.net/10347/38902

DOI: 10.1109/SNAMS.2018.8554689

Publication date

2018-12-02

Authors

Gamallo Otero, Pablo

García González, Marcos

Piñeiro Pomar, César Alfredo

Martínez-Castaño, Rodrigo

Pichel Campos, Juan Carlos

Publisher

IEEE

Metrics

Export

Abstract

This paper presents LinguaKit, a multilingual suite of tools for analysis, extraction, annotation and linguistic correction, as well as its integration into a Big Data infrastructure. LinguaKit allows the user to perform different tasks such as PoS-tagging, syntactic parsing, coreference resolution (among others), including applications for relation extraction, sentiment analysis, summarization, extraction of multiword expressions, or entity linking to DBpedia. Most modules work in four languages: Portuguese, Spanish, English, and Galician. The system is programmed in Perl and is freely available under a GPLv3 license.

Bibliographic citation

P. Gamallo, M. Garcia, C. Piñeiro, R. Martinez-Castaño and J. C. Pichel, "LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction," 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, Spain, 2018, pp. 239-244, doi: 10.1109/SNAMS.2018.8554689.

Rights

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.

Collections

Electrónica e Computación
Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)

Full item page

LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Sponsors

Rights

Collections