Digital Hands ON - ANTONIA KARAISL & NICK WHITE (Rescribe): "Turn up the pipeline: Making high quality OCR accessible"
When: | Th 10-03-2022 16:00 - 17:00 |
Where: | Online |

Digital Hands ON: the new webinar series of the Groningen Centre for Digital Humanities highlighting new tools and methods.
Historical texts are crucial across Humanities fields, and a variety of digital methods both enhance and expand our possibilities for studying, exploring, editing, and manipulating them. This series brings together scholars and researchers from different domains to present innovative approaches, tools, methods, and research programmes that can help putting digital hands on historical records.
Our first speakers are Antonia Karaisl and Nick White of Rescribe.
Abstract
Optical Character Recognition (OCR), the technology that can render scanned pages of text editable and machine-readable, is a fairly ubiquitous technology in our day and age. Ever since the rise of neural networks in this field, even traditionally complex materials such as historic printed books or medieval manuscripts can be successfully handled. The lack of user-friendly solutions, however, continue to bar access to these sophisticated technologies for scholars with little technological know-how. This talk is presenting a simple app built to harness the powerful open source engine Tesseract in a maximally user-friendly way, packaging preprocessing, OCR and postprocessing in one single pipeline.