Creating an Annotated Health Record Dataset in a Limited-Resource Environment.

Informace o publikaci

Autoři	ANETTA Krištof
Rok publikování	2023
Druh	Článek ve sborníku
Konference	Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023
Fakulta / Pracoviště MU	Fakulta informatiky
Citace
www	https://nlp.fi.muni.cz/raslan/2023/paper11.pdf
Klíčová slova	Electronic health records; EHR; annotation; named entity recognition; NER; medical concept mining
Popis	This paper demonstrates a workflow for creating a dataset of annotated electronic health records in an environment that is limited in terms of both language resources and expert availability. From preannotation using rule-based methods to the redundancy of multiple annotators per document and the resulting degrees of confidence for each annotation, including the possible avenues of data augmentation in order to be able to train large language models, this paper discusses the practical considerations of how to make the best of the resource-strapped situation shared by so many researchers who analyze health records.
Související projekty:	Využití technik umělé inteligence pro zpracování dat, komplexní analýzy a vizualizaci rozsáhlých dat

Jak na přijímačky