Skip to ContentSkip to Navigation
About us Latest news News News articles

Families and resemblances

29 November 2010

PhD ceremony: Ms. J. Prokic, 13.15 uur, Academiegebouw, Broerstraat 5, Groningen

Thesis: Families and resemblances

Promotor(s): prof. J. Nerbonne

Faculty: Arts


Dialectometry is a multidisciplinary field that uses quantitative methods in the analysis of dialect data. From the very beginning, most of the research in dialectometry has been focused on including large amounts of data in analyses and offering alternative views to researchers. Later it was used for the identification of dialect groups and development of methods that would tell us how similar (or different) one variety is when compared to the neighboring varieties. In this book Prokic presents advances in several techniques that allow the researcher to automatically measure the differences between language varieties. She tests all methods on Bulgarian dialect pronunciation data.

Part of the research presented relies on the Levenshtein algorithm to aggregate over the numerous features found in the data and infer the similarities/distances among the groups of dialects. Prokic investigates the application of clustering techniques in the detection of dialect groups, and proposes several evaluation techniques that can be used to estimate the quality of the automatically obtained groups. In order to automatically infer the distances between the phones in the data set we combine the Levenshtein algorithm with the technique called pointwise mutual information. Information on the distances between the phones helps us get better estimates on the distances between the strings, and consequently on the distances between language varieties.

Prokic also tests an alternative approach to dialect variation that is more historically motivated. She employs a method taken from phylogenetics, namely Bayesian inference of phylogeny, which focuses on systematic shared innovations as a signal of common ancestry, and reexamines the relatedness among the Bulgarian dialect varieties. This method is applied to the automatically multiply aligned strings, which she produces and evaluates using two novel methods.

The results of applying different quantitative techniques to the Bulgarian dialect data show that some of the traditional divisions of this area have to be questioned if only pronunciation data is taken into account. The comparison of the divisions resulting from the geographic and historical approaches has shown that these two different perspectives gave very similar pictures of the Bulgarian dialect variation. None of the methods developed are language specific, nor are they applicable only to the dialect data.



Last modified:13 March 2020 01.17 a.m.
printView this page in: Nederlands

More news