![Důležité termíny](https://cdn.muni.cz/media/3633704/image_2.jpg?mode=crop¢er=0.5,0.5&rnd=133572412150000000&heightratio=0.5&width=278)
Informace o publikaci
Character-based Language Model
Autoři | |
---|---|
Rok publikování | 2014 |
Druh | Článek ve sborníku |
Konference | Eighth Workshop on Recent Advances in Slavonic Natural Language Processing |
Fakulta / Pracoviště MU | |
Citace | |
www | https://nlp.fi.muni.cz/raslan/2014/6.pdf |
Obor | Jazykověda |
Klíčová slova | language model; suffix array; LCP; trie; character-based; random text generator; corpus |
Popis | Language modelling and also other natural language processing tasks are usually based on words. I present here a more general yet simpler approach to language modelling using much smaller units of text data: character-based language model (CBLM). In this paper I describe the underlying data structure of the model, evaluate the model using standard measures (entropy, perplexity). As a proof-of-concept and an extrinsic evaluation I present also a random sentence generator based on this model. |
Související projekty: |