Cross-Lingual Word Embeddings for Morphologically Rich Languages

Üstün, A., Bouma, G. & Noord, van, G., 2-Sep-2019, p. 1222-1228. 7 p.

Research output: Contribution to conferencePaperAcademic

Cross-lingual word embedding models learn
a shared vector space for two or more lan-
guages so that words with similar meaning
are represented by similar vectors regardless
of their language. Although the existing mod-
els achieve high performance on pairs of mor-
phologically simple languages, they perform
very poorly on morphologically rich languages
such as Turkish and Finnish. In this pa-
per, we propose a morpheme-based model in
order to increase the performance of cross-
lingual word embeddings on morphologically
rich languages. Our model includes a sim-
ple extension which enables us to exploit mor-
phemes for cross-lingual mapping. We ap-
plied our model for the Turkish-Finnish lan-
guage pair on the bilingual word translation
task. Results show that our model outper-
forms the baseline models by 2% in the nearest
neighbour ranking.
Original languageEnglish
Number of pages7
Publication statusPublished - 2-Sep-2019
EventRecent Advances in Natural Language Processing 2019 - Varna, Bulgaria
Duration: 2-Sep-20194-Sep-2019


ConferenceRecent Advances in Natural Language Processing 2019
Abbreviated titleRANLP 2019
Internet address


Recent Advances in Natural Language Processing 2019


Varna, Bulgaria

Event: Conference

ID: 109559256