The English–Spanish parallel corpus PaEnS
Loading...
Identifiers
Publication date
Authors
Advisors
Tutors
Editors
Journal Title
Journal ISSN
Volume Title
Publisher
Peter Lang
Abstract
This chapter presents the PaEnS English-Spanish Parallel Corpus, a sentence-level aligned parallel corpus, which at the time of writing comprises some 130 million words. This corpus is part of a larger ongoing project, PaCorES, an acronym for Spanish Parallel Corpora, which aims to build a series of parallel corpora between Spanish and several major languages. This paper presents the main features of the PaEnS corpus, starting with a brief description of the drawbacks that other similar resources pose for the intended applications. The design and composition of the corpus is described, explaining the data selection criteria. Next, the different phas-es of the workflow are discussed: text preprocessing, segmentation, automatic alignment and manual review. Next, the web presentation of the corpus and the search possibilities are described. Finally, the future development of the corpus is outlined, and a brief recapitulation of its distinctive features is given.
Description
Keywords
Bibliographic citation
Doval, Irene. 2023. "The English–Spanish Parallel Corpus PaEnS." In Current Trends on Digital Technologies and Gaming for Language Teaching and Linguistics, edited by I. Santos Díaz et al., 145–164. Berlin: Peter Lang
Relation
Has part
Has version
Is based on
Is part of
Is referenced by
Is version of
Requires
Publisher version
https://www.peterlang.com/document/1350076Sponsors
PaCorEAgencia Estatal de Investigación: Corpus paralelos online del español. Una herramienta multifuncional para la traducción, el aprendizaje de lenguas y la investigación lingüística (PID2021-125313OB-I00, IP: Irene Doval)







