Project information
Pattern Recognition-based Statistically Enhanced MT
(PRESEMT)
- Project Identification
- 248307
- Project Period
- 1/2010 - 12/2012
- Investor / Pogramme / Project type
-
European Union
- 7th Specific RTD Programme
- Cooperation
- MU Faculty or unit
- Faculty of Informatics
- Cooperating Organization
-
Institute for Language and Speech Processing
- Responsible person George Tambouratzis
Norwegian University of Science and Technology
National Technical University of Athens
Lexical Computing Ltd.
This proposal describes PRESEMT, a flexible and adaptable MT system, based on a language-independent method, whose principles ensure easy portability to new language pairs. This method attempts to overcome well-known problems of other MT approaches, e.g. bilingual corpora compilation or creation of new rules per language pair. PRESEMT will address the issue of effectively managing multilingual content and is expected to suggest a language-independent machine-learning-based methodology. The key aspects of PRESEMT involve syntactic phrase-based modelling, pattern recognition approaches (such as extended clustering or neural networks) or game theory techniques towards the development of a language-independent analysis, evolutionary algorithms for system optimisation. It is intended to be of a hybrid nature, combining linguistic processing with the positive aspects of corpus-based approaches, such as SMT and EBMT.
Publications
Total number of publications: 14
2012
-
Building a 70 billion word corpus of English from ClueWeb
Proceedings of the Eight International Conference on Language Resources and Evaluation (LREC'12), year: 2012
-
Detecting Spam in Web Corpora
6th Workshop on Recent Advances in Slavonic Natural Language Processing, year: 2012
-
Finding Multiwords of More Than Two Words
Proceedings of the 15th EURALEX International Congress, year: 2012
-
Linguistic Logical Analysis of Direct Speech
RASLAN 2012 Recent Advances in Slavonic Natural Language Processing, year: 2012
2011
-
Analyzing Time-Related Clauses in Transparent Intensional Logic
Proceedings of Recent Advances in Slavonic Natural Language Processing 2011, year: 2011
-
Corpus-based Disambiguation for Machine Translation
Recent Advances in Slavonic Natural Language Processing, year: 2011
-
Effective Parsing Using Competing CFG Rules
Proceedings of Text, Speech and Dialogue 2011, year: 2011
-
chared: Character Encoding Detection with a Known Language
RASLAN 2011, year: 2011
-
Japanese Word Sketches: Advances and Problems
Acta Linguistica Asiatica, year: 2011, volume: 1/2011, edition: 2
-
Practical Web Crawling for Text Corpora
Proceedings of Recent Advances in Slavonic Natural Language Processing, RASLAN 2011, year: 2011