The English–Spanish parallel corpus PaEnS

Loading...
Thumbnail Image
Identifiers

Publication date

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Peter Lang
Metrics
Google Scholar
lacobus
Export

Research Projects

Organizational Units

Journal Issue

Abstract

This chapter presents the PaEnS English-Spanish Parallel Corpus, a sentence-level aligned parallel corpus, which at the time of writing comprises some 130 million words. This corpus is part of a larger ongoing project, PaCorES, an acronym for Spanish Parallel Corpora, which aims to build a series of parallel corpora between Spanish and several major languages. This paper presents the main features of the PaEnS corpus, starting with a brief description of the drawbacks that other similar resources pose for the intended applications. The design and composition of the corpus is described, explaining the data selection criteria. Next, the different phas-es of the workflow are discussed: text preprocessing, segmentation, automatic alignment and manual review. Next, the web presentation of the corpus and the search possibilities are described. Finally, the future development of the corpus is outlined, and a brief recapitulation of its distinctive features is given.

Description

Keywords

Bibliographic citation

Doval, Irene. 2023. "The English–Spanish Parallel Corpus PaEnS." In Current Trends on Digital Technologies and Gaming for Language Teaching and Linguistics, edited by I. Santos Díaz et al., 145–164. Berlin: Peter Lang

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Sponsors

PaCorEAgencia Estatal de Investigación: Corpus paralelos online del español. Una herramienta multifuncional para la traducción, el aprendizaje de lenguas y la investigación lingüística (PID2021-125313OB-I00, IP: Irene Doval)

Rights