Lambert Schomaker - The challenge of continuous machine learning in big data

Wanneer:di 12-03-2013 15:00 - 16:00

Current research in machine learning and pattern recognition is focused on academic benchmark data sets. Fortunately, such data sets are bigger today than ever before. Large data sets allow for a neat k-fold evaluation. Training and test data will thus fulfill the iid requirement, yielding independently sampled examples from identical probability distributions. On some data sets, classification rates are reached far over 99%, as in the classification of handwritten digits. However, many real-world problems are characterized by an a priori lack of knowledge. In our research we focus on continuous learning, from scratch. The experimental paradigm is organized around the development of a search engine for handwritten historical manuscript collections. This research has delivered a number of interesting insights concerning the concept of optimality in classification. Furthermore, the Monk system is a working example of '24/7 machine learning' on massive collections with hundreds of millions of image instances and tens of thousands of classes, running at the high-performance computing center at the University of Groningen. The Monk architecture is general. Besides image classification, future projects focus on data mining in genomics and in Raman spectroscopy.

Deel dit Facebook LinkedIn

View this page in: English

Lambert Schomaker - The challenge of continuous machine learning in big data

Functioneel

Standaard

Volledig