
Creating an Annotated Health Record Dataset in a Limited-Resource Environment.
Autoři | |
---|---|
Rok publikování | 2023 |
Druh | Článek ve sborníku |
Konference | Proceedings of the Seventeenth Workshop on Recent Advances in Slavonic Natural Languages Processing, RASLAN 2023 |
Fakulta / Pracoviště MU | |
Citace | |
www | https://nlp.fi.muni.cz/raslan/2023/paper11.pdf |
Klíčová slova | Electronic health records; EHR; annotation; named entity recognition; NER; medical concept mining |
Popis | This paper demonstrates a workflow for creating a dataset of annotated electronic health records in an environment that is limited in terms of both language resources and expert availability. From preannotation using rule-based methods to the redundancy of multiple annotators per document and the resulting degrees of confidence for each annotation, including the possible avenues of data augmentation in order to be able to train large language models, this paper discusses the practical considerations of how to make the best of the resource-strapped situation shared by so many researchers who analyze health records. |
Související projekty: |