The anatomy of antonymy: a corpus-driven approach
PhD ceremony: Ms. G.V. Lobanova, 11.00 uur, Academiegebouw, Broerstraat 5, Groningen
Dissertation: The anatomy of antonymy: a corpus-driven approach
Promotor(s): prof. L.C. Verbrugge
Faculty: Mathematics and Natural Sciences
This dissertation deals with opposites, that is, in Dutch, words like arm – rijk, dag – nacht, openen – sluiten, and other pairs that express some type of contrast. First, we explore pattern-based methods for finding opposites automatically. Second, we analyze automatically found opposites and compare them with opposites extensively studied and classified by theoretical linguists.
Our methodology is based on the assumption that opposites co-occur with each other within a sentence significantly more often than would be expected by chance and that often they can be found in intrasentential patterns like [tussen <ANT> en <ANT>]. Using small sets of six, 12 and 18 seed pairs expressed either by adjectives, nouns or verbs, we identify the best patterns for finding new pairs of opposites in a 450 million word newspaper corpus of Dutch. In the first study, we automatically generate strictly textual patterns like [either <ANT> countries or <ANT> countries] that do not contain any syntactic information, but simply capture surface strings. In the second study, we generate surface patterns that contain part-of-speech information about target word pairs, like [the difference between <ANT/Adj> and <ANT/Adj>]. In the third study, we use a parsed corpus to automatically acquire patterns with syntactic dependencies. Such patterns abstract away from the surface structure capturing that, for example, <ANT1/Noun> is the subject and <ANT2/Noun> is the direct object and they are connected by the verb appreciate.
The best results were achieved with part-of-speech patterns, which identified many typical as well as novel opposites. Textual patterns found the same most frequent opposites across the seed sets of all three syntactic categories and the majority of these pairs were well-established opposites. Dependency patterns found the least number of opposites per seed set but they found many novel pairs.
Overall, the best results are achieved by the algorithm that relies on adding the minimum amount of syntactic information, namely only part-of-speech information. Since this method does not require any computationally costly preprocessing steps and can easily be applied to vast amounts of data, part-of-speech patterns offer a promising solution to automatic extraction of opposites.
The results show that the range of automatically found opposites surpasses the limited number of well-established opposites commonly discussed in the theoretical approaches on opposites. In particular, pattern-based methods can find not only typical opposites like oud – nieuw, arm - rijk, but also less conventional opposites like nieuw – bestaand, nieuw – tweedehands, nieuw – bekend, and oud – recent, non-typical domain-specific opposites like wit –rood (wine), Democraat – Republikein (political parties) and context-dependent pairs like migrant – Nederlander (Dutch newspaper texts), buitenlands – Nederlands (as an analogue of buitenlands – binnenlands in the context of local and international policies). Although such pairs exhibit similar behavior in the corpus to the canonical opposites, non-typical context-dependent opposites have been neglected in theoretical classifications. Our results provide evidence that opposites include a much wider range of pairs than has been previously recognized.
In fact, automatically found opposites, especially domain-specific and context-dependent pairs that are often missed in the existing lexical resources, are particularly useful for other natural language processing tasks. This is further confirmed by the fact that, contrary to our assumptions, we found no differences between typical and non-typical opposites as to the frequency and the types of patterns in which they were found. This shows that both types are valid opposites that need to be studied in the future.
Last modified: | 13 March 2020 01.02 a.m. |
More news
-
12 June 2025
Those most affected by modern agriculture
Farmers only grow a limited number of crops these days, which has significant consequences for the animals that live there. Raymond Klaassen researches what adjustments farmers could make to improve the conditions for the species most affected by...
-
06 June 2025
India-Netherlands Hydrogen Valley Fellowship Programme announced
To coincide with World Environment Day, 5 June 2025, the Indian Department of Science and Technology and the University of Groningen yesterday announced a Hydrogen Valley Fellowship Programme Partnership, allowing talented Indian scholars working on...
-
05 June 2025
The self-reliant plant
Kira Tiedge investigates the chemical substances that plants use to communicate with their environment, to select robust varieties that can better withstand challenging circumstances such as diseases or drought.