A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías da Informacióngl
dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Lingua e Literatura Españolas, Teoría da Literatura e Lingüística Xeralgl
dc.contributor.areaÁrea de Enxeñaría e Arquitectura
dc.contributor.authorPichel Campos, José Ramom
dc.contributor.authorGamallo Otero, Pablo
dc.contributor.authorAlegría, Iñaki
dc.contributor.authorNeves, Marco
dc.date.accessioned2021-03-05T12:17:30Z
dc.date.available2021-09-01T01:00:08Z
dc.date.issued2020
dc.descriptionThis is an Accepted Manuscript of an article published by Taylor & Francis in Journal of Quantitative Linguistics on 01 Mar 2020, available online: http://www.tandfonline.com/10.1080/09296174.2020.1732177gl
dc.description.abstractThe aim of this paper is to apply a corpus-based methodology, based on the measure of perplexity, to automatically calculate the cross-lingual language distance between historical periods of three languages. The three historical corpora have been constructed and collected with the closest spelling to the original on a balanced basis of fiction and non-fiction. This methodology has been applied to measure the historical distance of Galician with respect to Portuguese and Spanish, from the Middle Ages to the end of the 20th century, both in original spelling and automatically transcribed spelling. The quantitative results are contrasted with hypotheses extracted from experts in historical linguistics. Results show that Galician and Portuguese are varieties of the same language in the Middle Ages and that Galician converges and diverges with Portuguese and Spanish since the last period of the 19th century. In this process, orthography plays a relevant role. It should be pointed out that the method is unsupervised and can be applied to other languagesgl
dc.description.peerreviewedSIgl
dc.description.sponsorshipThis work has received financial support from DOMINO project [PGC2018-102041-B-I00, MCIU/AEI/FEDER, UE]; eRisk project [RTI2018-093336-B-C21]; the Consellería de Cultura, Educación e Ordenación Universitaria (accreditation 2016-2019, ED431G/08, Consolidation and structuring of Groups with Growth Potential: 745ED431B 2017/39) and the European Regional Development Fund (ERDF)gl
dc.identifier.citationJosé Ramom Pichel, Pablo Gamallo, Iñaki Alegria & Marco Neves (2020) A Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexity, Journal of Quantitative Linguistics, DOI: 10.1080/09296174.2020.1732177gl
dc.identifier.doi10.1080/09296174.2020.1732177
dc.identifier.essn1744-5035
dc.identifier.issn0929-6174
dc.identifier.urihttp://hdl.handle.net/10347/24655
dc.language.isoenggl
dc.publisherTaylor & Francisgl
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/RTI2018-093336-B-C21/ES/TECNOLOGIAS PARA LA PREDICCION TEMPRANA DE SIGNOS RELACIONADOS CON TRASTORNOS PSICOLOGICOS
dc.relation.projectIDinfo:eu-repo/grantAgreement/AEI/Plan Estatal de Investigación Científica y Técnica y de Innovación 2017-2020/PGC2018-102041-B-I00/ES/TRADUCCION AUTOMATICA NEURONAL, EN DOMINIO, NO SUPERVISADA
dc.relation.publisherversionhttps://doi.org/10.1080/09296174.2020.1732177gl
dc.rights© Taylor & Francis, 2020gl
dc.rights.accessRightsopen accessgl
dc.titleA Methodology to Measure the Diachronic Language Distance between Three Languages Based on Perplexitygl
dc.typejournal articlegl
dc.type.hasVersionAMgl
dspace.entity.typePublication
relation.isAuthorOfPublication898ee1bb-f9e8-4a75-9858-a6c9142bc99e
relation.isAuthorOfPublication.latestForDiscovery898ee1bb-f9e8-4a75-9858-a6c9142bc99e

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2020_jql_pichel_methodology.pdf
Size:
999.19 KB
Format:
Adobe Portable Document Format
Description: