Discovering bilingual collocations in parallel corpora: a first attempt at using distributional semantics

Loading...
Thumbnail Image
Identifiers

Publication date

Advisors

Tutors

Journal Title

Journal ISSN

Volume Title

Publisher

John Benjamins Publishing Company
Metrics
Google Scholar
lacobus
Export

Research Projects

Organizational Units

Journal Issue

Abstract

This chapter presents a method that exploits parallel corpora to automatically extract bilingual collocation equivalents. First, we use dependency parsing and statistical measures to identify collocation candidates in corpora. Then, we leverage the parallel corpora to extract bilingual word-embeddings. Finally, we use these distributional models as probabilistic dictionaries in order to identify bilingual collocation equivalents. To evaluate our strategy we carry out a set of experiments in Portuguese and Spanish focusing on verb-object collocations, for example, “reach the maturity” (“atingir a maturidade” in Portuguese, “alcanzar la madurez” in Spanish). The results of our experiments show that this method is useful to automatically identify thousands of bilingual collocation equivalents, achieving a precision of 86%

Description

Bibliographic citation

Garcia, Marcos, García-Salido, Marcos and Alonso-Ramos, Margarita. "Discovering bilingual collocations in parallel corpora: A first attempt at using distributional semantics". Parallel Corpora for Contrastive and Translation Studies: New resources and applications, edited by Irene Doval and M. Teresa Sánchez Nieto, John Benjamins Publishing Company, 2019, pp. 267-279. https://doi.org/10.1075/scl.90.16gon

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Sponsors

This work has been supported by the Spanish Ministry of Economy, Industry and Competitiveness (MINECO) through projects FFI2016-78299- P and FFI2014-51978-C2-1-R, by a Juan de la Cierva formación grant (FJCI-2014-22853), and by a postdoctoral fellowship endowed by the Galician Government (POS-A/2013/191)

Rights