Probing for idiomaticity in vector space models

García González, Marcos; Vieira, Tiago Kramer; Scarton, Carolina; Idiart, Marco; Villavicencio, Aline

doi:10.18653/v1/2021.eacl-main.310

Probing for idiomaticity in vector space models

Files

2021_EurChapACL_Garcia_Probing.pdf (839.09 KB)

Identifiers

URI: https://hdl.handle.net/10347/45969

ISBN: 978-1-954085-02-2

DOI: 10.18653/v1/2021.eacl-main.310

Publication date

2021-04

Authors

García González, Marcos

Editors

Merlo, Paola

Tiedemann, Jorg

Tsarfaty, Reut

Publisher

Association for Computational Linguistics

Metrics

Export

Abstract

Contextualised word representation models have been successfully used for capturing different word usages and they may be an attractive alternative for representing idiomaticity in language. In this paper, we propose probing measures to assess if some of the expected linguistic properties of noun compounds, especially those related to idiomatic meanings, and their dependence on context and sensitivity to lexical choice, are readily available in some standard and widely used representations. For that, we constructed the Noun Compound Senses Dataset, which contains noun compounds and their paraphrases, in context neutral and context informative naturalistic sentences, in two languages: English and Portuguese. Results obtained using four types of probing measures with models like ELMo, BERT and some of its variants, indicate that idiomaticity is not yet accurately represented by contextualised models

Keywords

Vector space models| Word representation models| Noun Compound Senses Dataset| Word usages

Bibliographic citation

Marcos Garcia, Tiago Kramer Vieira, Carolina Scarton, Marco Idiart, and Aline Villavicencio. 2021. Probing for idiomaticity in vector space models. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, pages 3551–3564, Online. Association for Computational Linguistics

Publisher version

http://doi.org/10.18653/v1/2021.eacl-main.310

Rights

© 1963–2026 ACL. Materials published in or after 2016 are licensed on a Creative Commons Attribution 4.0 International License
Attribution 4.0 International

Collections

Lingua e Literatura Españolas, Teoría da Literatura e Lingüística Xeral
Centro de Investigación en Tecnoloxías Intelixentes da USC (CiTIUS)

Full item page

Probing for idiomaticity in vector space models

Files

Identifiers

Publication date

Authors

Advisors

Tutors

Editors

Journal Title

Journal ISSN

Volume Title

Publisher

Metrics

Export

Research Projects

Organizational Units

Journal Issue

Abstract

Description

Keywords

Bibliographic citation

Relation

Has part

Has version

Is based on

Is part of

Is referenced by

Is version of

Requires

Publisher version

Sponsors

Rights

Collections