Download PDFOpen PDF in browser

Unsupervised Cross-lingual Word Embeddings Based on Subword Alignment

EasyChair Preprint 2254

12 pagesDate: December 25, 2019

Abstract

Cross-lingual word embeddings are crucial building blocks for multilingual models, and recent studies indicate that they are obtainable without any cross-lingual resources. However, experimental results indicate that performance of such cross-lingual word embeddings degrades on distant language pairs such as English-Japanese. In this paper, we propose an unsupervised method to obtain cross-lingual word embeddings that utilize subword alignment to capture trivially translatable pairs with less ambiguity such as named entities, loanwords. These words tend to be unambiguously translatable and thus can provide a more reliable signal to obtain bilingual dictionary. Our method first obtains initial cross-lingual word embeddings by an existing unsupervised method to induce bilingual dictionary to learn subword alignment, and then extract the word pairs whose surfaces are alignable to construct a high-quality bilingual dictionary by the induced alignment model. We finally use the resulting bilingual dictionary to obtain high-quality cross-lingual word embeddings. Experimental results in four language pairs, English-Japanese, English-Finnish, English-Spanish, and English-Italic, indicate that cross-lingual word embeddings obtained with our method outperform an existing method, especially on distant language pairs (3% in English-Japanese and 2% in English-Finnish).

Keyphrases: Cross-lingual, Natural Language Processing, Representation Learning, bilingual lexicon induction, cross-lingual word embeddings, distant language, multilingual, word embedding

BibTeX entry
BibTeX does not have the right entry for preprints. This is a hack for producing the correct reference:
@booklet{EasyChair:2254,
  author    = {Jin Sakuma and Naoki Yoshinaga},
  title     = {Unsupervised Cross-lingual Word Embeddings Based on Subword Alignment},
  howpublished = {EasyChair Preprint 2254},
  year      = {EasyChair, 2019}}
Download PDFOpen PDF in browser