Learning from Data

Faculteit Letteren
Jaar 2019/20
Vakcode LIX016M05
Vaknaam Learning from Data
Niveau(s) master
Voertaal Engels
Periode semester I a
Rooster >>>

Uitgebreide vaknaam Learning from Data
Leerdoelen The course has a strong focus on practice, so that students are expected to be able to practically
run machine learning experiments on a given (NLP) problem. They will master key concepts
and terminology of machine learning, understand training and testing procedures, and use existing
tools that support machine learning experiments - more specifically, they will become
accustomed to using existing libraries and software, and preparing data for it. In setting up an
experiment for a given task, they will be able to decide how to represent a problem, choose and
implement features for learning and an appropriate algorithm, and interpret the results critically,
by understanding evaluation metrics as well as possible sources of errors (overfitting, little data,
etc). They will also know how to appropriately report on the experiments they run, as it is done
in academic publications.
Omschrijving This is a course on how to learn models from (large amounts of) data, with specific attention
to language data and Natural Language Processing (NLP) applications. The course balances
theory and practice, by covering conceptual as well as implementation aspects. This isn’t a
theoretical course on the mathematical aspects of learning, rather a course aimed at equipping
the students with practical abilities to run machine learning experiments, building on solid theoretical
background. Theory is covered during the lectures, which introduce the main issues and
topics related to machine learning for NLP, such as general settings of a learning experiment,
the main algorithms used in classification, both supervised and unsupervised (Naive Bayes, Decision
Trees, SVM, KNN, linear regression, perceptron, Clustering), and the concept of feature
and feature selection. Evaluation issues are also introduced, such as metrics, but also error interpretation,
so as to understand what goes wrong in theory and practice (overfitting, amount of
training data). Semi-supervised learning techniques such as distant learning active learning and
co-training are also discussed. We also devote two full weeks to introducing Neural Networks
and working with them. Implementation is covered by the weekly assignments, always related
to the topics covered in class, which are discussed and worked on during the Labs. The students
will be learning to use ML libraries which use Python natively, such as NLTK and Scikit Learn.
For the Neural Network-related portions of the course, we will use the Keras and Gensim libraries.
A final bigger project ensures that both theory and practice are employed to create a
working system on a real (NLP) problem.
Uren per week 4
Onderwijsvorm hoorcollege, werkcollege
Toetsvorm computeropdrachten, verslag, wekelijkse opdrachten
Vaksoort master
Coördinator prof. dr. M. Nissim
Docent(en) prof. dr. M. Nissim
Opmerkingen Enrollment in Progress will be possible from 17 June until 30 August. The seminar-group enrollment in Nestor is possible from 14 August 07:00 hours until 28 August 23.59 hours. The Faculty maintains the right to change the curriculum, the number of groups and the timetable.
Opgenomen in
Opleiding Jaar Periode Type
Course units for exchange students 4 semester I a mast
MSc Computing Science: Data Science and Systems Complexity  (Guided choice course units) - semester I a keuze
Ma Communicatie- en Informatiewetenschappen  (Information Science) 1 semester I a keuze
ReMa Taalwetenschappen / Linguistics  (ReMa Language and Communication Technologies (LCT); Erasmus Mundus) 1 semester I a keuze