Informace o publikaci

Bilingual Lexicon Induction From Comparable and Parallel Data: A Comparative Analysis

Autoři

DENISOVÁ Michaela RYCHLÝ Pavel

Rok publikování 2024
Druh Článek ve sborníku
Konference International Conference on Text, Speech, and Dialogue
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www Preprint version
Doi http://dx.doi.org/10.1007/978-3-031-70563-2_3
Klíčová slova bilingual lexicon induction; cross-lingual word embeddings; neural machine translation systems
Popis Bilingual lexicon induction (BLI) from comparable data has become a common way of evaluating cross-lingual word embeddings (CWEs). These models have drawn much attention, mainly due to their availability for rare and low-resource language pairs. An alternative offers systems exploiting parallel data, such as popular neural machine translation systems (NMTSs), which are effective and yield state-of-the-art results. Despite the significant advancements in NMTSs, their effectiveness in the BLI task compared to the models using comparable data remains underexplored. In this paper, we provide a comparative study of the NMTS and CWE models evaluated on the BLI task and demonstrate the results across three diverse language pairs: distant (Estonian-English) and close (Estonian-Finnish) language pair and language pair with different scripts (Estonian-Russian). Our study reveals the differences, strengths, and limitations of both approaches. We show that while NMTSs achieve impressive results for languages with a great amount of training data available, CWEs emerge as a better option when faced less resources.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info