Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis

Denisová,  Michaela; Rychlý,  Pavel

Informace o publikaci

Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis

Autoři	DENISOVÁ Michaela RYCHLÝ Pavel
Rok publikování	2024
Druh	Článek ve sborníku
Konference	International Conference on Text, Speech, and Dialogue
Fakulta / Pracoviště MU	Fakulta informatiky
Citace
www	Preprint version
Doi	http://dx.doi.org/10.1007/978-3-031-70563-2_3
Klíčová slova	bilingual lexicon induction; cross-lingual word embeddings; neural machine translation systems
Popis	Bilingual lexicon induction (BLI) from comparable data has become a common way of evaluating cross-lingual word embeddings (CWEs). These models have drawn much attention, mainly due to their availability for rare and low-resource language pairs. An alternative offers systems exploiting parallel data, such as popular neural machine translation systems (NMTSs), which are effective and yield state-of-the-art results. Despite the significant advancements in NMTSs, their effectiveness in the BLI task compared to the models using comparable data remains underexplored. In this paper, we provide a comparative study of the NMTS and CWE models evaluated on the BLI task and demonstrate the results across three diverse language pairs: distant (Estonian-English) and close (Estonian-Finnish) language pair and language pair with different scripts (Estonian-Russian). Our study reveals the differences, strengths, and limitations of both approaches. We show that while NMTSs achieve impressive results for languages with a great amount of training data available, CWEs emerge as a better option when faced less resources.
Související projekty:	Využití technik umělé inteligence pro zpracování dat, komplexní analýzy a vizualizaci rozsáhlých dat

Jak na přijímačky

Důležité termíny

Přečtěte si o výzkumu na MU

Jak na přijímačky

Důležité termíny

Přečtěte si o výzkumu na MU

Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis