Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
Autoři | |
---|---|
Rok publikování | 2020 |
Druh | Článek ve sborníku |
Konference | Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020 |
Fakulta / Pracoviště MU | |
Citace | MEDVEĎ, Marek, Radoslav SABOL a Aleš HORÁK. Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering. In Aleš Horák. Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020. Brno: Tribun EU, 2020, s. 23-34. ISBN 978-80-263-1600-8. |
www | |
Klíčová slova | question answering; dataset management; machine learning; optimization |
Popis | Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated. |
Související projekty: |