Publication details

Distributed System for Discovering Similar Documents: From a Relational Database to the Custom-Developed Parallel Solution

Authors

KASPRZAK Jan BRANDEJS Michal KŘIPAČ Miroslav ŠMERK Pavel

Year of publication 2008
Type Article in Proceedings
Conference ICEIS 2008: Proceedings of the Tenth International Conference on Enterprise Information Systems, Vol. DISI - Databases and Informations Systems Integration
MU Faculty or unit

Faculty of Informatics

Citation
Field Informatics
Keywords University; Plagiarism; Similar Documents; Cluster; Information System; Theses
Description One of the drawbacks of e-learning methods such as Web-based submission and evaluation of students' papers and essays is that it has become easier for students to plagiarize the work of other people. In this paper we present a computer-based system for discovering similar documents, which has been in use at Masaryk University in Brno since August 2006, and which will also be used in the forthcoming Czech national archive of graduate theses. We also focus on practical aspects of this system: achieving near real-time response to newly imported documents, and computational feasibility of handling large sets of documents on commodity hardware. We also show the possibilities and problems with parallelization of this system for running on a distributed cluster of computers.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info