You are here:
Publication details
Indexing and Searching Mathematics in Digital Libraries -- Architecture, Design and Scalability Issues
Authors | |
---|---|
Year of publication | 2011 |
Type | Article in Proceedings |
Conference | Intelligent Computer Mathematics Lecture Notes in Computer Science, 2011, Volume 6824/2011 |
MU Faculty or unit | |
Citation | |
Web | DOI |
Doi | http://dx.doi.org/10.1007/978-3-642-22673-1_16 |
Field | Informatics |
Keywords | math indexing and retrieval; mathematical digital libraries; information systems; information retrieval; mathematical content search; document ranking of mathematical papers; math text mining; MIaS; WebMIaS |
Description | This paper surveys approaches and systems for searching mathematical formulae in mathematical corpora and on the web. The design and architecture of our MIaS (Math Indexer and Searcher) system is presented, and our design decisions are discussed in detail. An approach based on Presentation MathML using a similarity of math subformulae is suggested and verified by implementing it as a math-aware search engine based on the state-of-the-art system, Apache Lucene. Scalability issues were checked based on 324,000 real scientific documents from arXiv archive with 112 million mathematical formulae. More than two billions MathML subformulae were indexed using our Solr-compatible Lucene extension. |
Related projects: |