Cthulhu Hails from Wales: N-gram Frequency Analysis of R'lyehian



Year of publication 2020
Type Article in Proceedings
Conference Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020
Faculty of Informatics

Keywords H. P. Lovecraft; language identification; N-grams; R'lyehian

R'lyehian is a unique fictional language penned by the prolific 20th century horror fiction author H. P. Lovecraft. Prior work in the area of the Lovecraftian mythos has not yet studied the similarities between R'lyehian and natural languages, which are crucial for determining its true origins.

We produced a comprehensive wordlist of R'lyehian and used open-source $N$-gram-based language identification tools to find the most similar natural languages to R'lyehian. From the comprehensive wordlist, we also constructed a frequency table of all unigraphs and digraphs in R'lyehian.

We show that R'lyehian is most similar to Celtic languages, which lays grounds for our hypothesis that R'lyeh, where Cthulhu lies dreaming, might be a place in Wales.

Our frequency tables will prove a useful resource for future work in the area of the Lovecraftian mythos.

