You are here:
Publication details
Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data
Authors | |
---|---|
Year of publication | 2023 |
Type | Article in Proceedings |
Conference | Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference |
MU Faculty or unit | |
Citation | |
web | Konferenční sborník |
Keywords | Ukrainian; post-editing; dictionary; lexicography |
Description | This paper describes the development of a new corpus-based Ukrainian-English dictionary. The dictionary was built from scratch, we used no pre-existing dictionary data. A rapid dictionary development method was used which consists of generating dictionary parts directly from a large corpus, and of post-editing the automatically generated data by native speakers of Ukrainian (not professional lexicographers). The method builds on Baisa et al. (2019) which was improved and updated, and we used a diferent data management model. As the data source, a 3-billion-word Ukrainian web corpus from the TenTen series (Jakubíček et al., 2013) was used. The paper briefy describes the corpus, then we thoroughly explain the individual steps of the miQKiB+ ;2M2`iBQMěTQbi@2/BiBM; workfow, including the volume of the manual work needed for the particular phases in terms of person-days. We also present details about the newly created dictionary and discuss directions for its further development. |