Mining for meaning. The extraction of lexico-semanticknowledge from text

24 June 2010

PhD ceremony: Mr. T. van de Cruys, 14.45 uur, Academiegebouw, Broerstraat 5, Groningen

Thesis: Mining for meaning. The extraction of lexico-semanticknowledge from text

Promotor(s): prof. J. Nerbonne

Faculty: Arts

Words have a particular meaning. While language users have no problems inferring those meanings, this is a hard task for a computer system. In his dissertation, Tim van de Cruys investigates how a computer might be able to infer the meaning of words automatically from large text collections. The basic approach for doing so is by comparing the contexts of words (such as the surrounding words, or the syntactic relations in which the word takes part), in order to determine how similar those contexts are. This information enables a computer to automatically extract groups of words from text that are similar to each other.

An important part of the research focuses on dimensionality reduction, and its application to language. The use of large text collections brings about a large number of contexts in which a word occurs. Using a mathematical dimensionality reduction, the abundance of individual contexts can be reduced to a limited number of significant dimensions. Characteristic for these dimensions is that they contain `latent semantics': the value of a word on a particular dimension indicates the score of the word for a particular semantic field (such as economics, transport, food, ...). The research shows that, with a number of simple algorithms, the meaning of words can automatically be extracted from text, and this is an important step towards a system that is able to understand what is written in texts.

Last modified:13 March 2020 01.15 a.m.

Share this Facebook LinkedIn

View this page in: Nederlands