Zde se nacházíte:
Informace o publikaci
Efficient Management and Optimization of Very Large Machine Learning Dataset for Question Answering
Autoři | |
---|---|
Rok publikování | 2020 |
Druh | Článek ve sborníku |
Konference | Proceedings of the Fourteenth Workshop on Recent Advances in Slavonic Natural Language Processing, RASLAN 2020 |
Fakulta / Pracoviště MU | |
Citace | |
www | |
Klíčová slova | question answering; dataset management; machine learning; optimization |
Popis | Question answering strategies lean almost exclusively on deep neural network computations nowadays. Managing a large set of input data (questions, answers, full documents, metadata) in several forms suitable as the first layer of a selected network architecture can be a non-trivial task. In this paper, we present the details and evaluation of preparing a rich dataset of more than 13 thousand question-answer pairs with more than 6,500 full documents. We show, how a Python-optimized database in a network environment was utilized to offer fast responses based on the 26 GiB database of input data. A global hyperparameter optimization process with controlled running of thousands of evaluation experiments to reach a near-optimum setup of the learning process is also explicated. |
Související projekty: |