Measuring language distance among historical varieties using perplexity. Application to European Portuguese

dc.contributor.affiliationUniversidade de Santiago de Compostela. Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)
dc.contributor.authorPichel, José Ramom
dc.contributor.authorGamallo Otero, Pablo
dc.contributor.authorAlegria Loinaz, Iñaki
dc.date.accessioned2026-01-14T09:13:26Z
dc.date.available2026-01-14T09:13:26Z
dc.date.issued2018-08-20
dc.description.abstractThe objective of this work is to quantify, with a simple and robust measure, the distance between historical varieties of a language. The measure will be inferred from text corpora corresponding to historical periods. Different approaches have been proposed for similar aims: Language Identification, Phylogenetics, Historical Linguistics or Dialectology. In our approach, we used a perplexity-based measure to calculate language distance between all the historical periods of a specific language: European Portuguese. Perplexity has also proven to be a robust metric to calculate distance between languages. However, this measure has not been tested yet to identify diachronic periods within the historical evolution of a specific language. For this purpose, a historical Portuguese corpus has been constructed from different open sources containing texts with close original spelling. The results of our experiments show that Portuguese keeps an important degree of homogeneity over time. We anticipate this metric to be a starting point to be applied to other languages.
dc.description.sponsorshipBBVA Foundation Grant for Researchers and Cultural Creators
dc.description.sponsorshipTelePares
dc.description.sponsorshipTADeep
dc.description.sponsorshipConsellería de Cultura, Educación e Ordenación Universitaria
dc.description.sponsorshipEuropean Regional Development Fund
dc.description.sponsorshipimaxin software
dc.identifier.citationJose Ramom Pichel Campos, Pablo Gamallo, and Iñaki Alegria. 2018. Measuring language distance among historical varieties using perplexity. Application to European Portuguese.. In Proceedings of the Fifth Workshop on NLP for Similar Languages, Varieties and Dialects (VarDial 2018), pages 145–155, Santa Fe, New Mexico, USA. Association for Computational Linguistics.
dc.identifier.urihttps://hdl.handle.net/10347/45128
dc.language.isoeng
dc.publisherAssociation for Computational Linguistics
dc.relation.projectIDinfo:eu-repo/grantAgreement/MINECO//TIN2015-70214-P/ES/TRADUCCION AUTOMATICA EN PROFUNDIDAD
dc.relation.publisherversionhttps://aclanthology.org/W18-3916/
dc.rightsAttribution 4.0 Internationalen
dc.rights.accessRightsopen access
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/
dc.subjectLanguage Distance
dc.subjectHistorical Linguistics
dc.subject.classification33 Ciencias tecnológicas
dc.titleMeasuring language distance among historical varieties using perplexity. Application to European Portuguese
dc.typebook part
dc.type.hasVersionVoR
dspace.entity.typePublication
relation.isAuthorOfPublication24c27e24-a456-4990-9f2b-a669bc8a66ea
relation.isAuthorOfPublication898ee1bb-f9e8-4a75-9858-a6c9142bc99e
relation.isAuthorOfPublication.latestForDiscovery24c27e24-a456-4990-9f2b-a669bc8a66ea

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2018_acl_pichel-etal_measuring.pdf
Size:
243.68 KB
Format:
Adobe Portable Document Format