Computational tools and spoken corpora design: an ongoing dialogue

Vázquez Rozas, Victoria; Barcala, Mario

doi:10.7203/caplletra.69.17270

Computational tools and spoken corpora design: an ongoing dialogue

dc.contributor.affiliation	Universidade de Santiago de Compostela. Departamento de Lingua e Literatura Españolas, Teoría da Literatura e Lingüística Xeral	es_ES
dc.contributor.author	Vázquez Rozas, Victoria
dc.contributor.author	Barcala, Mario
dc.date.accessioned	2024-02-02T11:46:10Z
dc.date.available	2024-02-02T11:46:10Z
dc.date.issued	2020
dc.description.abstract	The design of an oral corpus and the processes of registering, codifying and treating the materials in order to build a useful resource for linguistic analysis prompt numerous decisions regarding theory and methodology. This article is focused on those stages of corpus construction which are more clearly conditioned by the computational processing necessary to make it functional. In order to adequately match the initial expectations and the real possibilities of using the tool, each feature we intend to codify must be measured against the workload and the means required to do so. Therefore, it is essential to take into account the available possibilities of processing and exploitation as they have a crucial impact on decisions regarding the corpus’ construction.Based on experience acquired in the construction of the ESLORA corpus, the present article looks into some of the problems arising in the process of designing an oral corpus, such as the delicacy.	es_ES
dc.description.abstract	El disseny d’un corpus oral i els processos de registrar, codificar i tractar els materials per construir un recurs útil per a l’anàlisi lingüística, comporta nombroses decisions pel que fa a la teoria i la metodologia. Aquest article s’ocupa d’aquelles etapes de la construcció d’un corpus que més clarament estan condicionades pel processament informàtic necessari que ha de fer el corpus funcional. Per tal de conjugar les expectatives inicials i les possibilitats reals quan usem l’eina, cada característica que pretenem codificar ha de ser mesurada quant a la càrrega de treball que comporta i els mitjans que són requerits per fer-ho possible. Per això, és essencial tenir en compte els recursos disponibles a l’hora de processar i explotar el corpus, ja que tenen un impacte fonamental en les decisions pel que fa a la construcció del corpus. Basat en l’experiència adquirida en la construcció del corpus ESLORA, l’article analitza alguns dels problemes que sorgeixen en el procés de dissenyar un corpus oral, com ara el grau de detall en què és representat el fenomen oral, la segmentació del discurs, la convivència de diferents sistemes d’etiquetatge simultanis i les particularitats de l’anotació en un context bilingüe o multilingüe.
dc.description.peerreviewed	SI	es_ES
dc.description.sponsorship	This study was financed by the Agencia Estatal de Investigación (AEI) ‘Spanish State Research Agency’ and by the Fondo Europeo de Desarrollo Regional (FEDER) (European Regional Development Fund) through the ESLORA+ project (FFI2017-86379-P). The authors are members of the research group Gramática del español ‘Spanish Grammar’ from the University of Santiago de Compostela, which has been awarded a grant for the Strengthening and Organisation of Research Groups with Potential for Growth by the Regional Government’s Education Department (ED431B 2017/39). The study has also benefited from the participation of the ESLORA project in the Red temática en estudios de Análisis del Discurso (FFI2017-90738-REDT).	es_ES
dc.identifier.citation	Vázquez Rozas, V.; Barcala, M.(2020). Computational toolsand spoken corpora desingn: an ongoing dialogue. Caplletra: Revista internacional de filología, N. 69, pp. 221-240.	es_ES
dc.identifier.doi	10.7203/caplletra.69.17270
dc.identifier.issn	2386-7159
dc.identifier.uri	http://hdl.handle.net/10347/32251
dc.language.iso	eng	es_ES
dc.publisher	Consorci d'Editors Valencians	es_ES
dc.relation.projectID	info:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2013-2016/FFI2017-86379-P/ES/EL CORPUS ESLORA DE ESPAÑOL ORAL: ENRIQUECIMIENTO, ANALISIS LINGUISTICO Y EXTRACCION DE RECURSOS/	es_ES
dc.relation.publisherversion	https://doi.org/10.7203/caplletra.69.17270	es_ES
dc.rights	© Caplletra. Revista Internacional de Filologia, 2020. This work is covered by the Creative Commons license type Attribution-NonCommercial-NoDerivatives 4.0 (CC BY-NC-ND 4.0)	es_ES
dc.rights.accessRights	open access	es_ES
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject	Stand-off annotation	es_ES
dc.subject	Oral corpora	es_ES
dc.subject	Segmentation	es_ES
dc.subject	In-line annotation	es_ES
dc.subject	POS tagging	es_ES
dc.subject	Corpus oral
dc.subject	Anotació stand-off
dc.subject	Anotació en línia
dc.subject	Segmentació
dc.subject	Etiquetatge morfològic
dc.title	Computational tools and spoken corpora design: an ongoing dialogue	es_ES
dc.title.alternative	Les eines computacionals i el disseny de corpus orals: un diàleg vigent
dc.type	journal article	es_ES
dc.type.hasVersion	VoR	es_ES
dspace.entity.type	Publication
relation.isAuthorOfPublication	2249a263-7305-4805-a5eb-978f112a8154
relation.isAuthorOfPublication.latestForDiscovery	2249a263-7305-4805-a5eb-978f112a8154

Files

Original bundle

Now showing 1 - 1 of 1

Name:: 2020_Computacional_tools.pdf
Size:: 307.96 KB
Format:: Adobe Portable Document Format
Description:: 2020_Computational_tools

Download

Collections

Lingua e Literatura Españolas, Teoría da Literatura e Lingüística Xeral