You are here:
Publication details
Evaluating Natural Language Processing Tasks with Low Inter-Annotator Agreement: The Case of Corpus Applications
Authors | |
---|---|
Year of publication | 2016 |
Type | Article in Proceedings |
Conference | Tenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2016 |
MU Faculty or unit | |
Citation | |
Field | Informatics |
Keywords | NLP; inter-annotator agreement; low inter-annotator agreement; evaluation; application; application-based evaluation; word sketch; thesaurus; terminology |
Description | In Low inter-annotator agreement = an ill-defined problem?, we have argued that tasks with low inter-annotator agreement are really common in natural language processing (NLP) and they deserve an appropriate attention. We have also outlined a preliminary solution for their evaluation. In On evaluation of natural language processing tasks: Is gold standard evaluation methodology a good solution? , we have agitated for extrinsic application-based evaluation of NLP tasks and against the gold standard methodology which is currently almost the only one really used in the NLP field. This paper brings a synthesis of these two: For three practical tasks, that normally have so low inter-annotator agreement that they are considered almost irrelevant to any scentific evaluation, we introduce an application-based evaluation scenario which illustrates that it is not only possible to evaluate them in a scientific way, but that this type of evaluation is much more telling than the gold standard way. |
Related projects: |