Discovering bilingual collocations in parallel corpora: a first attempt at using distributional semantics

dc.contributor.affiliationUniversidade de Santiago de Compostela. Departamento de Lingua e Literatura Españolas, Teoría da Literatura e Lingüística Xeral
dc.contributor.authorGarcía González, Marcos
dc.contributor.authorGarcía Salido, Marcos
dc.contributor.authorAlonso Ramos, Margarita
dc.contributor.editorDoval Reixa, Irene
dc.contributor.editorSánchez Nieto, M. Teresa
dc.date.accessioned2026-02-13T13:40:08Z
dc.date.available2026-02-13T13:40:08Z
dc.date.issued2019
dc.description.abstractThis chapter presents a method that exploits parallel corpora to automatically extract bilingual collocation equivalents. First, we use dependency parsing and statistical measures to identify collocation candidates in corpora. Then, we leverage the parallel corpora to extract bilingual word-embeddings. Finally, we use these distributional models as probabilistic dictionaries in order to identify bilingual collocation equivalents. To evaluate our strategy we carry out a set of experiments in Portuguese and Spanish focusing on verb-object collocations, for example, “reach the maturity” (“atingir a maturidade” in Portuguese, “alcanzar la madurez” in Spanish). The results of our experiments show that this method is useful to automatically identify thousands of bilingual collocation equivalents, achieving a precision of 86%
dc.description.sponsorshipThis work has been supported by the Spanish Ministry of Economy, Industry and Competitiveness (MINECO) through projects FFI2016-78299- P and FFI2014-51978-C2-1-R, by a Juan de la Cierva formación grant (FJCI-2014-22853), and by a postdoctoral fellowship endowed by the Galician Government (POS-A/2013/191)
dc.identifier.citationGarcia, Marcos, García-Salido, Marcos and Alonso-Ramos, Margarita. "Discovering bilingual collocations in parallel corpora: A first attempt at using distributional semantics". Parallel Corpora for Contrastive and Translation Studies: New resources and applications, edited by Irene Doval and M. Teresa Sánchez Nieto, John Benjamins Publishing Company, 2019, pp. 267-279. https://doi.org/10.1075/scl.90.16gon
dc.identifier.doi10.1075/scl.90.16gon
dc.identifier.isbn9789027262844
dc.identifier.urihttps://hdl.handle.net/10347/45906
dc.language.isoeng
dc.publisherJohn Benjamins Publishing Company
dc.relation.ispartofseriesStudies in Corpus Linguistics; 90
dc.relation.publisherversionhttps://doi.org/10.1075/scl.90.16gon
dc.rights.accessRightsopen access
dc.subjectLearning
dc.subjectConventionalized lexical combinations
dc.subjectCollocations
dc.subjectBilingual collocation equivalents
dc.titleDiscovering bilingual collocations in parallel corpora: a first attempt at using distributional semantics
dc.typebook part
dc.type.hasVersionAM
dspace.entity.typePublication
relation.isAuthorOfPublicationae090fc6-2387-4087-ba21-7271835b4b35
relation.isAuthorOfPublication.latestForDiscoveryae090fc6-2387-4087-ba21-7271835b4b35
relation.isEditorOfPublication6dd6a8e2-0713-49d8-bd83-bb134192a00f
relation.isEditorOfPublication.latestForDiscovery6dd6a8e2-0713-49d8-bd83-bb134192a00f

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
2019_ParCorp_Garcia_Discovery.pdf
Size:
307.66 KB
Format:
Adobe Portable Document Format