Informace o publikaci

POS Annotated 50M Corpus of Tajik Language

Autoři

DOVUDOV Gulshan SUCHOMEL Vít ŠMERK Pavel

Rok publikování 2012
Druh Článek ve sborníku
Konference Proceedings of the Workshop on Language Technology for Normalisation of Less-Resourced Languages (SALTMIL 8/AfLaT 2012)
Fakulta / Pracoviště MU

Fakulta informatiky

Citace
www http://www.cnts.ua.ac.be/sites/default/files/saltmil8-aflat2012.pdf
Obor Informatika
Klíčová slova Tajik language; Tajik corpus; morphological analysis of Tajik
Popis Paper presents by far the largest available computer corpus of Tajik language of the size of more than 50 million words. To obtain the texts for the corpus two different approaches were used and the paper offers a description of both of them. Then the paper describes a newly developed morphological analyzer of Tajik and presents some statistics of its application on the corpus.
Související projekty:

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info