LinguaKit: a Big Data-based multilingual tool for linguistic analysis and information extraction

Research Projects

Organizational Units

Journal Issue

Abstract

This paper presents LinguaKit, a multilingual suite of tools for analysis, extraction, annotation and linguistic correction, as well as its integration into a Big Data infrastructure. LinguaKit allows the user to perform different tasks such as PoS-tagging, syntactic parsing, coreference resolution (among others), including applications for relation extraction, sentiment analysis, summarization, extraction of multiword expressions, or entity linking to DBpedia. Most modules work in four languages: Portuguese, Spanish, English, and Galician. The system is programmed in Perl and is freely available under a GPLv3 license.

Description

Bibliographic citation

P. Gamallo, M. Garcia, C. Piñeiro, R. Martinez-Castaño and J. C. Pichel, "LinguaKit: A Big Data-Based Multilingual Tool for Linguistic Analysis and Information Extraction," 2018 Fifth International Conference on Social Networks Analysis, Management and Security (SNAMS), Valencia, Spain, 2018, pp. 239-244, doi: 10.1109/SNAMS.2018.8554689.

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Sponsors

This work has been supported by MINECO (TIN2014-54565-JIN, FFI2014- 51978-C2-1-R), MICINN (IJCI-2016-29598), Xunta de Galicia (ED431G/08), European Regional Development Fund (ERDF), and by two BBVA Foundation Grants for Researchers and Cultural Creators (2016 and 2017).

Rights

© 2018 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.