Publication details

Archiv českého webu v roce 3

Title in English Archive Of The Czech Web In Year 3
Authors

ŽABIČKA Petr

Year of publication 2002
Type Article in Periodical
Magazine / Source Národní knihovna - knihovnická revue
MU Faculty or unit

Faculty of Informatics

Citation
Field Documentation, library studies, information management
Keywords Web archiving; resource selection; long-term preservation
Description The Webarchiv project started in 2000 as a R&D project of the National Library. Its main target was to investigate problems connected with collection and long-term preservation of electronic information resources. By the end of 2001 the project team was able to establish a testbed for harvesting electronic resources from the web as well as infrastructure supporting metadata creation, based on software tools developed with the support of NEDLIB project. These tools (Dublin Core Metadata Generator and NEDLIB Harvester) proved their quality in this phase and were further developed to suit the project needs. Although the legal situation concerning web harvesting and archiving by the National Library is not yet clear in the Czech Republic the project team was able to initiate first harvest of the .cz domain in April. The harvesting criteria were set very loose to allow for broadest coverage of the Czech web space. After three months, the Harvester collected about 10 million (0.25 TB) documents from about 30.000 of the total of 117.000 second-level domains under .cz top-level domain. In the near future the whole hardware and software infrastructure will be developed to allow for faster and even more reliable harvesting. At the same time, the project tools will also be enhanced and further integrated into the overall infrastructure of the National Library. Another task for the project team will be to find ways of making the archive accessible to the public. Hopefully, this will be done with the support of EU's 6th Framework Programme under the European Web Archive project.
Related projects:

You are running an old browser version. We recommend updating your browser to its latest version.

More info