Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis

Denisová,  Michaela; Rychlý,  Pavel

Publication details

Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis

Authors	DENISOVÁ Michaela RYCHLÝ Pavel
Year of publication	2024
Type	Article in Proceedings
Conference	International Conference on Text, Speech, and Dialogue
MU Faculty or unit	Faculty of Informatics
Citation
web	Preprint version
Doi	https://doi.org/10.1007/978-3-031-70563-2_3
Keywords	bilingual lexicon induction; cross-lingual word embeddings; neural machine translation systems
Description	Bilingual lexicon induction (BLI) from comparable data has become a common way of evaluating cross-lingual word embeddings (CWEs). These models have drawn much attention, mainly due to their availability for rare and low-resource language pairs. An alternative offers systems exploiting parallel data, such as popular neural machine translation systems (NMTSs), which are effective and yield state-of-the-art results. Despite the significant advancements in NMTSs, their effectiveness in the BLI task compared to the models using comparable data remains underexplored. In this paper, we provide a comparative study of the NMTS and CWE models evaluated on the BLI task and demonstrate the results across three diverse language pairs: distant (Estonian-English) and close (Estonian-Finnish) language pair and language pair with different scripts (Estonian-Russian). Our study reveals the differences, strengths, and limitations of both approaches. We show that while NMTSs achieve impressive results for languages with a great amount of training data available, CWEs emerge as a better option when faced less resources.
Related projects:	Using artificial intelligence techniques for data processing, complex analysis and visualization of large-scale data

10 reasons why you will fall in love with MU

Ask our ambassador

Read about research at MU

Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis