RNDr. Jan Pomikálek, Ph.D.

Personal photo

 

Person identification
  • RNDr. Jan Pomikálek, Ph.D., born 9th October 1979 in Prague, married
Workplace
  • Faculty of Informatics, Masaryk University, Botanická 68a, 602 00 Brno, Czech Republic
Employment position
  • Research assistent
Education and academic qualifications
  • 2011: Ph.D. in Informatics, Faculty of Informatics, Masaryk University Brno; thesis: Removing Duplicate and Boilerplate Content from Web Corpora
  • 2008: RNDr. in Informatics, Faculty of Informatics, Masaryk University Brno; thesis: Building parallel corpora from the Web
  • 2004: Mgr. (M.Sc. equivalent) in Informatics, Faculty of Informatics, Masaryk University Brno; thesis: A system for solving crosswords (in Czech)
Professional experience
  • 2008-: Technical manager, Lexical Computing Ltd
  • 2004-: Research assistant, Masaryk University
  • 1998-2004: Freelance web developer and programmer
Teaching activities
  • 2007-2008: Programming in Java
Research interests
  • Natural language processing, text corpora, Web as corpus, cleaning boilerplate from Web pages, removing duplicates and near-duplicates in large text collections.
Major publications
  • POMIKÁLEK, Jan a Pavel RYCHLÝ a Adam KILGARRIFF. Scaling to Billion-plus Word Corpora. Advances in Computational Linguistics, Mexiko: Instituto Politécnico Nacional, 41, zima 2009, od s. 3-13, 14 s. ISSN 1870-4069. 2009.  info
  • KILGARRIFF, Adam a Siva REDDY a Jan POMIKÁLEK. Corpus Factory. Bangkok, Thajsko, 2009. URL info
  • POMIKÁLEK, Jan a Pavel RYCHLÝ. Detecting Co-Derivative Documents in Large Text Collections. In Proceedings of the Sixth International Language Resources and Evaluation (LREC'08). Marrakech, Morocco: European Language Resources Association (ELRA), 2008. od s. 132-135, 3 s. ISBN 2-9517408-4-0. URL info
  • POMIKÁLEK, Jan a Radim ŘEHŮŘEK. The Influence of Preprocessing Parameters on Text Categorization. International Journal of Applied Science, Engineering and Technology, 4/2007, 1, od s. 430-434, 5 s. ISSN 1307-4318. 2007. URL info
  • NOVÁČEK, Vít a Pavel SMRŽ a Jan POMIKÁLEK. Text Mining for Semantic Relations as a Support Base of a Scientific Portal Generator. In Proceedings of LREC 2006 - 5th International Conference on Language Resources and Evaluation. Paris: ELRA, 2006. od s. 1338-1343, 6 s. ISBN 2-9517408-2-4. URL info
  • BARONI, Marco a Adam KILGARRIFF a Jan POMIKÁLEK a Pavel RYCHLÝ. WebBootCaT: instant domain-specific corpora to support human translators. In Proceedings of EAMT 2006 - 11th Annual Conference of the European Association for Machine Translation. Oslo: The Norwegian National LOGON Consortium and The Deparments of Computer Science and Linguistics and Nordic Studies at Oslo University (Norway), 2006. od s. 247-252, 252 s. ISBN 82-7368-294-3. URL info
  • CINKOVÁ, Silvie a Jan POMIKÁLEK. LEMPAS: A Make-Do Lemmatizer for the Swedish PAROLE-Corpus. Prague Bulletin of Mathematical Linguistics, Praha, 2006, 86, od s. 47-53, 68 s. ISSN 0032-6585. 2006.  info

Last update: 2011/11/09