The English–Spanish parallel corpus PaEnS

Doval Reixa, Irene

doi:10.3726/b20963

The English–Spanish parallel corpus PaEnS

Files

2023_The_English_Spanish_parallel_Corpus_doval.pdf (832.93 KB)

Identifiers

URI: https://hdl.handle.net/10347/39211

ISBN: 9783631889008

DOI: 10.3726/b20963

Publication date

2023

Authors

Doval Reixa, Irene

Publisher

Peter Lang

Metrics

Export

Abstract

This chapter presents the PaEnS English-Spanish Parallel Corpus, a sentence-level aligned parallel corpus, which at the time of writing comprises some 130 million words. This corpus is part of a larger ongoing project, PaCorES, an acronym for Spanish Parallel Corpora, which aims to build a series of parallel corpora between Spanish and several major languages. This paper presents the main features of the PaEnS corpus, starting with a brief description of the drawbacks that other similar resources pose for the intended applications. The design and composition of the corpus is described, explaining the data selection criteria. Next, the different phas-es of the workflow are discussed: text preprocessing, segmentation, automatic alignment and manual review. Next, the web presentation of the corpus and the search possibilities are described. Finally, the future development of the corpus is outlined, and a brief recapitulation of its distinctive features is given.

Bibliographic citation

Doval, Irene. 2023. "The English–Spanish Parallel Corpus PaEnS." In Current Trends on Digital Technologies and Gaming for Language Teaching and Linguistics, edited by I. Santos Díaz et al., 145–164. Berlin: Peter Lang

Publisher version

https://www.peterlang.com/document/1350076

Collections

Filoloxía Inglesa e Alemá

Full item page

The English–Spanish parallel corpus PaEnS

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Publisher version

Sponsors

Rights

Collections