Scroll to the bottom of the page to read about the latest project developments and current Target activities related to the Monk system.
Monk is a software system for character recognition, developed within a broader research initiative called Continuous Access to Cultural Heritage (CATCH), sponsored by the Netherlands Organization for Scientific Research (NWO), where efforts are focused on developing methods and techniques that utilize ICT technology to facilitate access to, and management of large historical collections. The system, designed by Prof. Schomaker's research group at the Department of Artificial Intelligence (also a partner in the Target consortium), explores ways of using the advantages offered by the fast growing computational, processing and storage technologies for improving the access and search through large digitized archives. Currently, the Monk system is used by Scratch4all - one of Target pilot projects managed by Target Holding, to search and store documents from the National Archive of the Netherlands (Nationaal Archief).
Monk specifically targets historical records, such as handwritten documents, for which traditional OCR (Optical Character recognition) techniques are not applicable. The system relies on efficient word-retrieval and recognition algorithms that can be trained in real time using inputs from volunteers, who label individual handwritten words. Scratch4all project has used the Monk system to ingest fifteen books form the National Archive and soon this number is expected to grow to forty. The long-term ambition is to use Monk technology in massive world heritage archives.
When developing Monk, Prof. Schomaker and his research group envisioned a system that integrates real-time machine learning, interactive web access and uninterrupted expansion with the help of high-performance computing and massive data storage. As such, Monk fitted well into the goal of the Target project - to design and build the necessary intelligent technology for large-scale, data- and computationally-intensive information systems. At present, the Monk system is running on the Target testbed alongside other Target projects. The cooperation and knowledge sharingenvironment, fostered by Target, provides a platform for steady improvements in the performance of Monk, particularly in the area of scalability. In addition, the development of new web interfaces has madeMonka very user-friendly system, easily accessible to the general public. As a result, Monk search engine has attracted considerable interest from archival institutions, nationally and internationally, and its significance as a novel system for access/search through large handwritten archives is only expected to grow.
Latest Project Developments
Currently, there are 37 collections ingested in the Monk system and this number is steadily growing not only in size but also in diversity. Apart from handwritten archives from the Dutch National Archive, Groningen Archives and the City Archive of Leuven Belgium, Monk is now dealing with historical records from various international institutions including, for example, a section of the famous Dead Sea Scrolls. Following the major redesign of the Target testbed, Monk has experienced better overall performance and improved reliability of its services.
The Monk team is currently focusing on automating most of the Monk procedures during the training cycles of its algorithms minimizing the need for human intervention. Concurrently, the team is using the varied image quality of the newly ingested collections to improve Monk’s image processing algorithms paying particular attention to color handling. For the duration of the Phase 3 of the Project Timeline, Monk will continue to ingest handwritten historical collections and refine the quality of its services.
During the first half of 2013, Target Holding worked in close collaboration with Monk to deliver some of Monk’s functionality to a larger user community via the project “Scratch4All as a Service” – an extension of the CATCHPlus project “Scratch4All”.
|Last modified:||December 04, 2014 11:13|