Project information
semANT - Semantic Document Exploration

Project Identification
DH23P03OVV060
Project Period
3/2023 - 12/2027
Investor / Pogramme / Project type
Ministry of Culture of the CR
MU Faculty or unit
Faculty of Social Studies
Keywords
digital library; topic identification; semantic document search; content exploration; content visualization
Cooperating Organization
The Moravian Library Brno
Brno University of Technology

Czech libraries and archives contain a huge number of digitized documents. The possibilities of their online presentation and search have been improving significantly in recent years. A large part of modern printed documents is already processed by OCR and therefore fully searchable. Also, there are tools for automatic transcription of old prints and handwritten documents. Their complete transcription is now only a matter of time.
However, the full-text search used in library systems is the simplest possible. It can work with different forms of a word, but not with the meaning. Thus, finding documents on a particular topic is very laborious. In contrast, current web search engines work with the words' meanings, making it possible to find texts that are relevant to the topic searched, though not containing the exact search term.
The main goal of this project is therefore to improve the searchability of the full-text representation of digitized documents at the level of text meaning and to improve the possibilities of natural navigation between related documents. We will provide users with a semantically enhanced full-text search, the possibility to search by text segments (e.g., paragraphs) and to specify the topic of interest at the same time. The system will work with automatically identified topics but will allow users to define their own topics based on examples.
The identification of topics will also be used to visualize the frequency of their occurrences and mutual interactions. Thus, it will be possible to track the evolution of topics over time, their continuity and transformation, or their connection to known named entities such as places and persons.
The results of the project will be used both by the general public for routine work with library systems and by the scientific community for enhanced text analysis. Also, we hope that parts of the project will find application in software for contemporary media and social networks analysis.

Sustainable Development Goals

Masaryk University is committed to the UN Sustainable Development Goals, which aim to improve the conditions and quality of life on our planet by 2030.

Sustainable Development Goal No.  4 – Quality education

You are running an old browser version. We recommend updating your browser to its latest version.

More info