Skip to ContentSkip to Navigation
University of Groningenfounded in 1614  -  top 100 university
About us Latest news News News articles

Lexical acquisition for computational grammars. A unified model

05 November 2012

PhD ceremony: Mr. K.D. Cholakov, 14.30 uur, Academiegebouw, Broerstraat 5, Groningen

Dissertation: Lexical acquisition for computational grammars. A unified model

Promotor(s): prof. G.J.M. van Noord, prof. J. Nerbonne

Faculty: Arts

Words are the building blocks in the implementation of many natural language processing systems. The lexical information in such systems is usually encoded in lexicons where the words are mapped to linguistic descriptions. However, lexicons will always be incomplete. Natural language is constantly evolving and new words emerge every day. It is impossible to list each word in a language in a lexicon. Kostadin Cholakov’s thesis describes a novel automated lexical acquisition model.

Cholakovs model learns the morphosyntactic properties of words which are not listed in lexicons employed by computational grammars of natural language. Two major aspects of the model set it apart from existing lexical acquisition techniques. First, it enables the acquisition of the full morphological paradigm of the unknown word. Second, different contexts of this word are considered during the acquisition process. This increases the amount and the diversity of the linguistic information available for the unknown word.

For each unknown word, a set of linguistic features is constructed automatically. Those features are used as an input to a statistical classifier which maps all forms in the paradigm of the unknown word to descriptions in the lexicon of the grammar. The lexical acquisition model is tested with computational grammars of Dutch and German. The results demonstrate its high-quality performance. Further, the model is applied to learn proper linguistic descriptions for words with wrong or incomplete entries in the lexicon of the grammar.

Finally, the work in this thesis goes beyond syntax. The lexical acquisition model is combined with vector-based semantic space techniques to acquire semantic properties of unknown words.

Last modified:13 March 2020 12.59 a.m.
Share this Facebook LinkedIn
View this page in: Nederlands

More news

  • 09 September 2025

    Art + science = 1-0 for humanity

    PhD candidate in Media Studies Marije Miedema and theater maker Mees van den Bergh joined forces. The result is the theatrical audio installation "Future of the Past," a project about how people want to be remembered digitally.

  • 26 August 2025

    Free rein for the crypto coin

    Canadian-Dutch political economist Malcolm Campbell-Verduyn is fascinated by cryptocurrencies. It is full steam ahead under American President Donald Trump, he says. At the same time, the economist himself was involved in the crypto coin Ada as an...

  • 17 July 2025

    Veni-grants for eleven UG researchers

    The Dutch Research Council (NWO) has awarded a Veni grant of up to €320,000 each to eleven researchers of the University of Groningen and the UMCG: Quentin Changeat, Wen Wu, Femke Cnossen, Stacey Copeland, Bart Danon, Gesa Kübek, Hannah Laurens, Adi...