Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech

dc.contributor.affiliationUniversidade de Santiago de Compostela. Instituto da Lingua Galega (ILG)gl
dc.contributor.authorSeara, Roberto
dc.contributor.authorMartínez, Marta
dc.contributor.authorVarela, Rocío
dc.contributor.authorGarcía-Mateo, Carmen
dc.contributor.authorFernández Rei, Elisa
dc.contributor.authorRegueira Fernández, Xosé Luís
dc.date.accessioned2018-11-20T12:17:40Z
dc.date.available2018-11-20T12:17:40Z
dc.date.issued2016
dc.description.abstractThe Corpus Oral Informatizado da Lingua Galega (CORILGA) project aims at building a corpus of oral language for Galician, primarily designed to study the linguistic variation and change. This project is currently under development and it is periodically enriched with new contributions. The long-term goal is that all the speech recordings will be enriched with phonetic, syllabic, morphosyntactic, lexical and sentence ELAN-complaint annotations. A way to speed up the process of annotation is to use automatic speech-recognition-based tools tailored to the application. Therefore, CORILGA repository has been enhanced with an automatic alignment tool, available to the administrator of the repository, that aligns speech with an orthographic transcription. In the event that no transcription, or just a partial one, were available, a speech recognizer for Galician is used to generate word and phonetic segmentations. These recognized outputs may contain errors that will have to be manually corrected by the administrator. For assisting this task, the tool also provides an ELAN tier with the confidence measure of each recognized word. In this paper, after the description of the main facts of the CORILGA corpus, the speech alignment and recognition tools are described. Both have been developed using the Kaldi toolkit.gl
dc.identifier.citationRoberto Seara, Marta Martinez, Rocío Varela, Carmen García-Mateo, Elisa Fernández-Rei, Xosé Luis Regueira (2016): Enhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speech. Nicoletta Calzolari et al. (eds.): Proceedings. 10th Language Resources and Evaluation Conference (LREC 2016). Portorož, Slovenia, 2893-3898. ISBN 978-2-9517408-9-1gl
dc.identifier.isbn978-2-9517408-9-1
dc.identifier.urihttp://hdl.handle.net/10347/17786
dc.language.isoenggl
dc.publisherEuropean Language Resources Associationgl
dc.relation.publisherversionhttp://www.lrec-conf.org/proceedings/lrec2016/index.htmlgl
dc.rightsAtribución-NoComercial 4.0 Internacionalgl
dc.rights.accessRightsopen accessgl
dc.rights.urihttp://creativecommons.org/licenses/by-nc/4.0/
dc.subjectCORILGAgl
dc.subjectLinguas minorizadasgl
dc.subjectLenguas minorizadasgl
dc.subjectMinority languagesgl
dc.subjectGalego (lingua)gl
dc.subjectGallego (lengua)gl
dc.subjectGalician (language)gl
dc.subjectVariación lingüísticagl
dc.subjectLinguistic variationgl
dc.titleEnhanced CORILGA: Introducing the Automatic Phonetic Alignment Tool for Continuous Speechgl
dc.typebook partgl
dspace.entity.typePublication
relation.isAuthorOfPublication223e421e-7064-4696-a90b-d8bdb96594ad
relation.isAuthorOfPublicationc2412a10-e98f-4bc7-a0d1-d54104e84e86
relation.isAuthorOfPublication.latestForDiscovery223e421e-7064-4696-a90b-d8bdb96594ad

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
LREC_2016.pdf
Size:
922.9 KB
Format:
Adobe Portable Document Format
Description: