ImUnipen image data set for writer identification (N=208) - vectorial handwriting converted to usable images

Schomaker, L. (Creator), University of Groningen, 9-Sep-2009



The ImUnipen data set is intended for non-commercial, scientific use,
and is distributed under auspices of the Unipen Foundation.

Please always refer to the following paper in IEEE PAMI when using
the ImUnipen data set:

Bulacu, M.; Schomaker, L.
Text-Independent Writer Identification and Verification
Using Textural and Allographic Features
Pattern Analysis and Machine Intelligence, IEEE Transactions on
Volume 29, Issue 4, April 2007 Page(s):701 - 717

The ImUnipen data set is derived from the Unipen (
data set of on-line (i.e., vectorial, xy) handwriting.
The xy-coordinates and a line-generator algorithm are used
to generate a raster image, as if the data were optically scanned.

Contents: for 208 writers, there are two PNG images per writer of
an artificially constructed table of naturally written words (49MByte).
These words are pasted onto a white page. For systematics reasons,
we call such a page a Paragraph, see below.

The file names are organized as (example):


meaning: writer number 990221, document 01 (there exists only Doc01)
and the image with artificial "paragraph" of isolated words "Par00"
and "Par01".

The Par00 and Pa01 images are typically used as the query
and best match in a leave-one-out setting for writer identification.
For instance, Par00 is the query, and Par01 is added to the total set
of all other images as the attractor for an identification search.

For these experiments, word labels are not given in this data set,
on purpose, as the goal is to test recognition-free writer identification

For a description of the regular
Unipen data set, please visit

Lambert Schomaker constructed this set in 2005
Date made available9-Sep-2009
PublisherUniversity of Groningen
Date of data production2005
Access to the dataset Open

    Keywords on Datasets

  • images of handwritten text, Unipen on-line handwriting database, writer identification, rasterized, on-line (vectorial) handwriting data, benchmarking

ID: 64099230