Mining for meaning. The extraction of lexico-semanticknowledge from text
PhD ceremony: Mr. T. van de Cruys, 14.45 uur, Academiegebouw, Broerstraat 5, Groningen
Thesis: Mining for meaning. The extraction of lexico-semanticknowledge from text
Promotor(s): prof. J. Nerbonne
Faculty: Arts
Words have a particular meaning. While language users have no problems inferring those meanings, this is a hard task for a computer system. In his dissertation, Tim van de Cruys investigates how a computer might be able to infer the meaning of words automatically from large text collections. The basic approach for doing so is by comparing the contexts of words (such as the surrounding words, or the syntactic relations in which the word takes part), in order to determine how similar those contexts are. This information enables a computer to automatically extract groups of words from text that are similar to each other.
An important part of the research focuses on dimensionality reduction, and its application to language. The use of large text collections brings about a large number of contexts in which a word occurs. Using a mathematical dimensionality reduction, the abundance of individual contexts can be reduced to a limited number of significant dimensions. Characteristic for these dimensions is that they contain `latent semantics': the value of a word on a particular dimension indicates the score of the word for a particular semantic field (such as economics, transport, food, ...). The research shows that, with a number of simple algorithms, the meaning of words can automatically be extracted from text, and this is an important step towards a system that is able to understand what is written in texts.
Last modified: | 13 March 2020 01.15 a.m. |
More news
-
12 March 2025
Breaking news: local journalism is alive
Local journalism is alive, still plays an important role in our lives and definitely has a future. In fact, local journalism can play a more crucial role than ever in creating our sense of community. But for that to happen, journalists will have to...
-
11 March 2025
Student challenge: Starting Stories
The Challenge Starting Stories dares you to think about the beginning of recent novels for ten days.
-
11 March 2025
Nieuw: Sketchengine, tool for language research
Sketch Engine is a tool for language research. It contains a collection of 800 ready-to-use text corpora in more than 100 languages, large enough (1 trillion words) to provide a representative picture of the language. Using Sketch Engine's...