You are here:
RNDr. Miloš Jakubíček, Ph.D.
Researcher, Centre for Natural Language Processing
CV
Curriculum Vitae
- Person-Related Identification Information
- Miloš Jakubíček (*1986)
- Department
- Natural Language Processing Centre, Faculty of Informatics, Masaryk University
- Employment - Position
- specialist
- Education and Academic Qualifications
- 2017: Ph.D. degree in Computer Science
- 2012: RNDr. degree in Computer Science (Artificial intelligence and natural language processing)
- 2010: Master degree in Computer Science (Artificial intelligence and natural language processing)
- 2008: Bachelor degree in Computer Science
- 2005: High school (gymnasium Jihlava)
- Employment
- 2009-: Natural Language Processing Centre, Faculty of Informatics, Masaryk University (specialist)
- 2012-2016: Computational Linguistics Centre, Faculty of Arts, Masaryk University (lecturer)
- 2008-: Lexical Computing (from 2014: CEO)
- Teaching Activities
- At Masaryk University:
- PLIN015 Seminar in Computational Linguistics II (lecture)
- PLIN013 Seminar in Computational Linguistics I (lecture)
- IA161 Syntactic Formalisms and Their Application in Natural Language Parsing (co-lecturing with Juyeon Kang)
- IB001 Introduction to programming (tutorial)
- VB000 Elements of Style (technical help)
- IB047 Introduction to Corpus Linguistics and Computer Lexicography (tutorial)
- Elsewhere:
- 2011 CLARA Training Course on Multilingual Tools and Resources (Bergen, Norway)
- 2016 ENeL Training School on Tools and methods for creating innovative e-dictionaries (Ljubljana, Slovenia)
- 2015, 2016, 2017, 2018, 2019, 2021, 2022, 2023: Lexicom, a Workshop in Lexicography and Lexical Computing (Telč, Czechia; Boulder, USA; Vienna, Austria; Leiden, Netherlands; Cambridge, UK; Mikulov, Czechia; Cambridge, UK; Telč, Czechia; Cambridge, UK)
- Scientific and Research Activities
- parsing of Czech and close free-word-order languages
- corpus linguistics and tools for processing of large corpora
- computer lexicography and lexical semantics
- Internship and stays for the purpose of study or work
- 2009/2010 University of Saarland, Germany.
- 2009/10/10 – 2010/03/10: Saarland University, Saarbrücken, DEU
- Activities Outside University
- 2002-2010: member of a youth working non-profit organization Slunce.
- 2008-: co-worker in Lexical Computing Ltd. developing the Sketch Engine
- Awards Related to Science and Research
- 2010: PACLIC 2010 best student paper award
- 2008: RASLAN 2008 best paper award
- Major Publications
- BLAHUŠ, Marek, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Jan KRAUS, Marek MEDVEĎ, Vlasta OHLÍDALOVÁ a Vít SUCHOMEL. Rapid Ukrainian-English Dictionary Creation Using Post-Edited Corpus Data. Online. In Marek Medveď, Michal Měchura, Carole Tiberius, Iztok Kosem, Jelena Kallas, Miloš Jakubíček, Simon Krek. Electronic lexicography in the 21st century (eLex 2023): Invisible Lexicography. Proceedings of the eLex 2023 conference. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2023, s. 613-637. ISSN 2533-5626. Konferenční sborník info
- JAKUBÍČEK, Miloš, Emma ROMANI, Pavel RYCHLÝ a Ondřej HERMAN. Development of HAMOD: a High Agreement Multi-lingual Outlier Detection dataset. In Horák, Rychlý, Rambousek. Recent Advances in Slavonic Natural Language Processing (RASLAN 2021). Brno: Tribun EU, 2021, s. 177-183. ISBN 978-80-263-1670-1. Full text PDF Domovská stránka workshopu info
- HERMAN, Ondřej, Vojtěch KOVÁŘ, Miloš JAKUBÍČEK a Pavel RYCHLÝ. Word Sense Induction Using Word Sketches. In Martín-Vide C., Purver M., Pollak S. Proceedings of the 7th International Conference on Statistical Language and Speech Processing. Cham: Springer, 2019, s. 83-91. ISBN 978-3-030-31371-5. Dostupné z: https://dx.doi.org/10.1007/978-3-030-31372-2_7. info
- BAISA, Vít, Marek BLAHUŠ, Michal CUKR, Ondřej HERMAN, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Marek MEDVEĎ, Michal MĚCHURA, Pavel RYCHLÝ a Vít SUCHOMEL. Automating dictionary production: a Tagalog-English-Korean dictionary from scratch. Online. In Proceedings of the 6th Biennial Conference on Electronic Lexicography. Brno, Czech Republic: Lexical Computing CZ s.r.o., 2019, s. 805-818. ISSN 2533-5626. Konferenční sborník info
- KOSEM, Iztok, Miloš JAKUBÍČEK, Jelena KALLAS, Simon KREK, Carole TIBERIUS, Tanara Zingano KUHN, Margarita CORREIA, José Pedro FERREIRA, Maarten JANSEN a Isabel PEREIRA. Electronic lexicography in the 21st century. Proceedings of the eLex 2019 conference. Brno: Lexical Computing CZ s.r.o., 2019. ISSN 2533-5626. URL info
- KOSEM, Iztok, Miloš JAKUBÍČEK, Jelena KALLAS, Simon KREK, Carole TIBERIUS a Vít BAISA. Electronic lexicography in the 21st century. Proceedings of the eLex 2017 conference. Brno: Lexical Computing CZ s.r.o., 2017. ISSN 2533-5626. URL info
- KOSEM, Iztok, Miloš JAKUBÍČEK, Jelena KALLAS a Simon KREK. Electronic lexicography in the 21st century: linking lexical data in the digital age. Proceedings of the eLex 2015 conference. Ljubljana/Brighton: Trojina, Institute for Applied Slovene Studies/Lexical Computing Ltd., 2015. ISBN 978-961-93594-3-3. URL info
- KILGARRIFF, Adam, Miloš JAKUBÍČEK, Jan POMIKÁLEK, Tony Berber SARDINHA a Pete WHITELOCK. PtTenTen: A corpus for Portuguese lexicography. In Tony Berber Sardinha, Telma de Lurdes São Bento Ferreira. Working with Portuguese Corpora. 1. vyd. London: Bloomsbury Publishing, 2014, s. 280-287. Bloomsbury Academic. ISBN 978-1-4411-9050-5. URL info
- KILGARRIFF, Adam, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Pavel RYCHLÝ a Vít SUCHOMEL. Finding Terms in Corpora for Many Languages with the Sketch Engine. Online. In Proceedings of the Demonstrations at the 14th Conferencethe European Chapter of the Association for Computational Linguistics. Gothenburg, Sweden: The Association for Computational Linguistics, 2014, s. 53-56. ISBN 978-1-937284-75-6. Plný text výsledku info
- JAKUBÍČEK, Miloš, Pavel RYCHLÝ a Adam KILGARRIFF. Effective Corpus Virtualization. Online. In Marc Kupietz, Hanno Biber, Harald Lüngen, Piotr Bański, Evelyn Breiteneder, Karlheinz Mörth, Andreas Witt, Jani Takhsha. Challenges in the Management of Large Corpora (CMLC-2). Reykjavik: EUROPEAN LANGUAGE RESOURCES ASSOCIATION-ELRA, 2014, s. 7-9. ISBN 978-2-9517408-8-4. URL info
- KILGARRIFF, Adam, Vít BAISA, Jan BUŠTA, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ, Jan MICHELFEIT, Pavel RYCHLÝ a Vít SUCHOMEL. The Sketch Engine: ten years on. Lexicography. Springer Berlin Heidelberg, 2014, roč. 1, č. 1, s. 7-36. ISSN 2197-4292. Dostupné z: https://dx.doi.org/10.1007/s40607-014-0009-9. URL info
- JAKUBÍČEK, Miloš a Vojtěch KOVÁŘ. Enhancing Czech Parsing with Verb Valency Frames. In CICLing 2013. Greece: Springer Verlag, 2013, s. 282-293. ISBN 978-3-642-37246-9. Dostupné z: https://dx.doi.org/10.1007/978-3-642-37247-6_23. info
- JAKUBÍČEK, Miloš, Pavel ŠMERK a Pavel RYCHLÝ. Fast Construction of a Word-Number Index for Large Data. In A. Horák, P. Rychlý. RASLAN 2013 Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2013, s. 63-67. ISBN 978-80-263-0520-0. URL info
- JAKUBÍČEK, Miloš a Marek MEDVEĎ. Portable Lexical Analysis for Parsing of Morphologically-Rich Languages. In A. Horák, P. Rychlý. RASLAN 2013 Recent Advances in Slavonic Natural Language Processing. Brno: Tribun EU, 2013, s. 21-26. ISBN 978-80-263-0520-0. info
- MEDVEĎ, Marek, Miloš JAKUBÍČEK a Vojtěch KOVÁŘ. Towards taggers and parsers for Slovak. In Zygmunt Vetulani & Hans Uszkoreit. Human Language Technologies as a Challenge for Computer Science and Linguistics. Proceedings of the 6th Language and Technology Conference. Poznań, Poland: Fundacja Uniwersytetu im. A. Mickiewicza, 2013, s. 527-530. ISBN 978-83-932640-3-2. LTC website info
- POMIKÁLEK, Jan, Pavel RYCHLÝ a Miloš JAKUBÍČEK. Building a 70 billion word corpus of English from ClueWeb. In Nicoletta Calzolari (Conference Chair) and Khalid Choukri and Thierry Declerck and Mehmet Ugur Dogan and Bente Maegaard and Joseph Mariani and Jan Odijk and Stelios Piperidis. Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12). Istanbul, Turkey: European Language Resources Association (ELRA), 2012, s. 502-506. ISBN 978-2-9517408-7-7. URL info
- KILGARRIFF, Adam, Jan POMIKÁLEK, Miloš JAKUBÍČEK a Pete WHITELOCK. Setting up for corpus lexicography. In Ruth Vatvedt Fjeld and Julie Matilde Torjusen. Proceedings of the 15th EURALEX International Congress. Oslo, Norway: Department of Linguistics and Scandinavian Studies, University of Oslo, 2012, s. 606-612. ISBN 978-82-303-2228-4. URL info
- JAKUBÍČEK, Miloš. Rule-Based Parsing of Morphologically Rich Languages. Masaryk University, 2012, 41 s. Dissertation thesis proposal. URL info
- MEDVEĎ, Marek, Miloš JAKUBÍČEK, Vojtěch KOVÁŘ a Václav NĚMČÍK. Adaptation of Czech Parsers for Slovak. In Aleš Horák, Pavel Rychlý. RASLAN 2012 Recent Advances in Slavonic Natural Language Processing. Brno, Czech Republic: Tribun EU, 2012, s. 23-30. ISBN 978-80-263-0313-8. URL info
- HORÁK, Aleš, Miloš JAKUBÍČEK a Vojtěch KOVÁŘ. Linguistic Logical Analysis of Direct Speech. In Aleš Horák, Pavel Rychlý. RASLAN 2012 Recent Advances in Slavonic Natural Language Processing. Brno, Czech Republic: Tribun EU, 2012, s. 51-59. ISBN 978-80-263-0313-8. URL info
- JAKUBÍČEK, Miloš. Effective Parsing Using Competing CFG Rules. In Habernal, Matoušek. Proceedings of Text, Speech and Dialogue 2011. Berlin, Heidelberg: Springer Verlag, 2011, s. 115-122. ISBN 978-3-642-23537-5. URL info
- JAKUBÍČEK, Miloš a Aleš HORÁK. Punctuation Detection with Full Syntactic Parsing. Research in Computing Science, Special issue: Natural Language Processing and its Applications. Mexiko: Instituto Politécnico Nacional, 2010, roč. 46, March 2010, s. 335-343. ISSN 1870-4069. URL info
- JAKUBÍČEK, Miloš, Pavel RYCHLÝ, Adam KILGARRIFF a Diana MCCARTHY. Fast syntactic searching in very large corpora for many languages. In PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation. Tokyo: Waseda University, 2010, s. 741-747. ISBN 978-4-905166-00-9. info
- JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ a Marek GRÁC. Through Low-Cost Annotation to Reliable Parsing Evaluation. In PACLIC 24 Proceedings of the 24th Pacific Asia Conference on Language, Information and Computation. Tokyo: Waseda University, 2010, s. 555-562. ISBN 978-4-905166-00-9. URL info
- BUŠTA, Jan a Miloš JAKUBÍČEK. Building of Corpus Based E-learning Materials for Czech. In SCO 2009 : Sharable Content Objects : 6. ročník konference o elektronické podpoře výuky. 1. vyd. Brno: Masarykova univerzita, 2009, s. 144-149. ISBN 978-80-210-4878-2. SCO 2009 info
- KOVÁŘ, Vojtěch, Miloš JAKUBÍČEK a Jan BUŠTA. Czech Vulgarisms in Text Corpora. In After Half a Century of Slavonic Natural Language Processing. 1. vyd. Brno: Tribun EU s.r.o., 2009, s. 141-145. ISBN 978-80-7399-815-8. info
- JAKUBÍČEK, Miloš, Aleš HORÁK a Vojtěch KOVÁŘ. Mining Phrases from Syntactic Analysis. In Text, Speech, Dialogue 2009. 1. vyd. Berlin Heidelberg: Springer Verlag, 2009, s. 124-130. ISBN 978-3-642-04207-2. Dostupné z: https://dx.doi.org/10.1007/978-3-642-04208-9_20. URL info
- KOVÁŘ, Vojtěch, Aleš HORÁK a Miloš JAKUBÍČEK. Syntactic Analysis as Pattern Matching: The SET Parsing System. In Proceedings of 4th Language & Technology Conference. Poznań (Poland): Wydawnictwo Poznańskie, 2009, s. 100-104. ISBN 978-83-7177-746-2. info
- JAKUBÍČEK, Miloš, Vojtěch KOVÁŘ a Aleš HORÁK. Measuring Coverage of a Valency Lexicon using Full Syntactic Analysis. In RASLAN 2009 : Recent Advances in Slavonic Natural Language Processing. 1. vyd. Brno: Masaryk University, 2009, s. 75-79. ISBN 978-80-210-5048-8. URL info
- KOVÁŘ, Vojtěch a Miloš JAKUBÍČEK. Prague Dependency Treebank Annotation Errors: A Preliminary Analysis. In RASLAN 2009 : Recent Advances in Slavonic Natural Language Processing. 1. vyd. Brno: Masaryk University, 2009, s. 101-108. ISBN 978-80-210-5048-8. URL info
- JAKUBÍČEK, Miloš, Jan BUŠTA, Dana HLAVÁČKOVÁ a Karel PALA. Classification of Errors in Text. In RASLAN 2009 : Recent Advances in Slavonic Natural Language Processing. 1. vyd. Brno: Masaryk University, 2009, s. 109-119. ISBN 978-80-210-5048-8. URL info
- KOVÁŘ, Vojtěch a Miloš JAKUBÍČEK. Test Suite for the Czech Parser Synt. In Proceedings of Recent Advances in Slavonic Natural Language Processing 2008. Brno: Masaryk University, 2008, s. 63-70. ISBN 978-80-210-4741-9. URL info
- KOVÁŘ, Vojtěch, Aleš HORÁK a Miloš JAKUBÍČEK. Power Networks Dialogs - Enhancing Domain-Specific Text Processing Techniques and Resources. In Proceedings of ELNET 2008. Ostrava: Faculty of Electrical Engineering and Computer Science, VŠB - Technical University of Ostrava, 2008, s. 72-80. ISBN 978-80-248-1875-7. info
- JAKUBÍČEK, Miloš. Extraction of Syntactic Structures Based on the Czech Parser Synt. In Proceedings of Recent Advances in Slavonic Natural Language Processing 2008. Brno: Masaryk University, 2008, s. 56-62. ISBN 978-80-210-4741-9. URL info
- JAKUBÍČEK, Miloš. Extrakce strukturních informací z běžného textu na základě syntaktického analyzátoru. Masarykova Univerzita, 2008. URL info
2023/10/22