![Důležité termíny](https://cdn.muni.cz/media/3633704/image_2.jpg?mode=crop¢er=0.5,0.5&rnd=133572412150000000&heightratio=0.5&width=278)
Informace o publikaci
csTenTen17, a Recent Czech Web Corpus
Autoři | |
---|---|
Rok publikování | 2018 |
Druh | Článek ve sborníku |
Konference | Proceedings of the Twelfth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2018 |
Fakulta / Pracoviště MU | |
Citace | |
www | https://nlp.fi.muni.cz/raslan/2018/paper10-Suchomel.pdf |
Klíčová slova | Czech corpus; web corpus; text processing |
Popis | This article introduces a very large Czech text corpus for language research – csTenTen17 compiled from texts downloaded in 2015, 2016 and 2017. The corpus is consisting of 10.5 billion words reaching double the size of its predecessor from 2012. A brief comparison with other recent Czech corpora follows. |
Související projekty: |