Zde se nacházíte:
Informace o publikaci
Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data
Autoři | |
---|---|
Rok publikování | 2023 |
Druh | Článek ve sborníku |
Konference | Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference |
Fakulta / Pracoviště MU | |
Citace | |
www | Konferenční sborník |
Klíčová slova | Ukrainian; post-editing; dictionary; lexicography |
Popis | This paper describes the development of a new corpus-based Ukrainian-English dictionary. The dictionary was built from scratch, we used no pre-existing dictionary data. A rapid dictionary development method was used which consists of generating dictionary parts directly from a large corpus, and of post-editing the automatically generated data by native speakers of Ukrainian (not professional lexicographers). The method builds on Baisa et al. (2019) which was improved and updated, and we used a diferent data management model. As the data source, a 3-billion-word Ukrainian web corpus from the TenTen series (Jakubíček et al., 2013) was used. The paper briefy describes the corpus, then we thoroughly explain the individual steps of the miQKiB+ ;2M2`iBQMěTQbi@2/BiBM; workfow, including the volume of the manual work needed for the particular phases in terms of person-days. We also present details about the newly created dictionary and discuss directions for its further development. |