PhD ceremony: Ms. G.V. Lobanova, 11.00 uur, Academiegebouw, Broerstraat 5, Groningen
Dissertation: The anatomy of antonymy: a corpus-driven approach
Promotor(s): prof. L.C. Verbrugge
Faculty: Mathematics and Natural Sciences
This dissertation deals with opposites, that is, in Dutch, words like arm – rijk, dag – nacht, openen – sluiten, and other pairs that express some type of contrast. First, we explore pattern-based methods for finding opposites automatically. Second, we analyze automatically found opposites and compare them with opposites extensively studied and classified by theoretical linguists.
Our methodology is based on the assumption that opposites co-occur with each other within a sentence significantly more often than would be expected by chance and that often they can be found in intrasentential patterns like [tussen <ANT> en <ANT>]. Using small sets of six, 12 and 18 seed pairs expressed either by adjectives, nouns or verbs, we identify the best patterns for finding new pairs of opposites in a 450 million word newspaper corpus of Dutch. In the first study, we automatically generate strictly textual patterns like [either <ANT> countries or <ANT> countries] that do not contain any syntactic information, but simply capture surface strings. In the second study, we generate surface patterns that contain part-of-speech information about target word pairs, like [the difference between <ANT/Adj> and <ANT/Adj>]. In the third study, we use a parsed corpus to automatically acquire patterns with syntactic dependencies. Such patterns abstract away from the surface structure capturing that, for example, <ANT1/Noun> is the subject and <ANT2/Noun> is the direct object and they are connected by the verb appreciate.
The best results were achieved with part-of-speech patterns, which identified many typical as well as novel opposites. Textual patterns found the same most frequent opposites across the seed sets of all three syntactic categories and the majority of these pairs were well-established opposites. Dependency patterns found the least number of opposites per seed set but they found many novel pairs.
Overall, the best results are achieved by the algorithm that relies on adding the minimum amount of syntactic information, namely only part-of-speech information. Since this method does not require any computationally costly preprocessing steps and can easily be applied to vast amounts of data, part-of-speech patterns offer a promising solution to automatic extraction of opposites.
The results show that the range of automatically found opposites surpasses the limited number of well-established opposites commonly discussed in the theoretical approaches on opposites. In particular, pattern-based methods can find not only typical opposites like oud – nieuw, arm - rijk, but also less conventional opposites like nieuw – bestaand, nieuw – tweedehands, nieuw – bekend, and oud – recent, non-typical domain-specific opposites like wit –rood (wine), Democraat – Republikein (political parties) and context-dependent pairs like migrant – Nederlander (Dutch newspaper texts), buitenlands – Nederlands (as an analogue of buitenlands – binnenlands in the context of local and international policies). Although such pairs exhibit similar behavior in the corpus to the canonical opposites, non-typical context-dependent opposites have been neglected in theoretical classifications. Our results provide evidence that opposites include a much wider range of pairs than has been previously recognized.
In fact, automatically found opposites, especially domain-specific and context-dependent pairs that are often missed in the existing lexical resources, are particularly useful for other natural language processing tasks. This is further confirmed by the fact that, contrary to our assumptions, we found no differences between typical and non-typical opposites as to the frequency and the types of patterns in which they were found. This shows that both types are valid opposites that need to be studied in the future.
The Kapteyn Astronomical Institute of the University of Groningen is working on a concrete plan for a new observatory in the Dark Sky Park Lauwersmeer. The observatory will be placed at the Lauwersnest Activity Centre of Staatsbosbeheer in Lauwersoog...
Many major Dutch companies publish extensive information about climate impact in their annual reports. However, very few companies provide concrete, detailed information about their own CO2 emissions, the impact of climate change on their business...
De NWO heeft aan 37 out-of-the-box onderzoeksideeën financiering toegekend vanuit de Ideeëngenerator. Een belangrijk kenmerk van de projecten is een mogelijke maatschappelijke impact. Elk van de onderzoekers krijgt 50.000 euro beschikbaar om met samen...