Informace o publikaci

Customers' Opinion Mining from Extensive Amount of Textual Reviews in Relation to Induced Knowledge Growth

Autoři

ŽIŽKA Jan SVOBODA Arnošt

Rok publikování 2015
Druh Článek v odborném periodiku
Časopis / Zdroj Acta Universitatis Agriculturae et Silviculturae Mendelianae Brunensis
Fakulta / Pracoviště MU

Ekonomicko-správní fakulta

Citace
www http://acta.mendelu.cz/63/6/2229/
Doi http://dx.doi.org/10.11118/actaun201563062229
Obor Informatika
Klíčová slova text mining; customer opinion analysis; decision trees; decision rules; windowing; large data volumes; machine learning; computational complexity; training-set size
Přiložené soubory
Popis Not only can the shortage of data be a data mining problem - having too much data may be the cause of difficulty as well. The experimental investigation of the influence of the review number on the knowledge mined from the text documents demonstrated primarily the not surprising cardinal high-time dependence. With the permanent increase of the volume of hotel-service reviews, the CPU time of the text mining process grew strongly non-linearly while the knowledge, expressed in generated semantically relevant words, remained increasing, too, even if its increase was progressively smaller all the time. Among others, the revealed relevant words (or phrases composed of them) can be further used as significant key-words for information retrieval or for defining more detailed topics hidden in text documents. After finishing the above described research, which aimed at revealing relevant words that represented the reviews, a following series of experiments have been started to mine better knowledge that would provide more information understandable by humans: automatically discovering significant phrases composed from relevant words. To find the phrases, a method of analyzing n-grams (here a contiguous sequence of n words) was applied to reviews written in English, Spanish, German, and Russian. Similar procedures as described in this article, using the same decision-trees/rules tool, data source, and windows containing constantly 100,000 reviews, were used. From the semantic point of view - unlike 1-grams described in this paper - the best phrases were provided by 3-grams, for example, "breakfast very good" (a positive phrase), "no free Internet" (a negative phrase) and so like. Details can be found in Žižka and Dařena (2015).

Používáte starou verzi internetového prohlížeče. Doporučujeme aktualizovat Váš prohlížeč na nejnovější verzi.

Další info