Možnosti a meze korpusového výzkumu proprií

Žižková,  Hana; Osolsobě,  Klára

Publication details

Možnosti a meze korpusového výzkumu proprií

Title in English	Possibilities and limitations of corpus research on proper names
Authors	ŽIŽKOVÁ Hana OSOLSOBĚ Klára
Year of publication	2024
MU Faculty or unit	Faculty of Arts
Citation
Description	In this presentation we would like to show the limits and possibilities of proprioid research based on our experience with Czech language corpora, taking into account the state of morphological marking used in the Czech environment. We will clarify how the individual steps of automatic morphological analysis affect the state of lemmatization and tagging in the case of proprias. We will touch upon the problem of tokenization and multi-word proprioids, the problem of completing the morphological dictionary in relation to proprioids, the peculiarities of flexion of proprioids and their homonymy with appellatives in relation to marking and disambiguation. We will point out the cases when it is not appropriate to rely on morphological tagging in research, and we will use concrete examples to show the distortion of research data caused by incorrect morphological tagging. We will outline ways to avoid bias in analyzed data. In the paper we will also show the possibilities of using different computational tools on concrete examples of onomastic research. We will present the differences in the use of data extraction from CNK, SketchEngine and Aranea corpora, and show the possibilities of more complex CQL queries in data classification. We will introduce the lesser known categories of tagging in Aranea corpora and show its effective use in onomastic data mining.
Related projects:	Lexikon a gramatika češtiny IV - 2024

10 reasons why you will fall in love with MU

Ask our ambassador

Read about research at MU

Možnosti a meze korpusového výzkumu proprií