A Central Asian language survey: Collecting data, measuring relatedness and detecting loans

Mennecier, P., Nerbonne, J., Heyer, E. & Manni, F., 1-Jun-2016, In : Language Dynamics and Change. 6, 1, p. 57-98 43 p.

Research output: Contribution to journalArticleAcademicpeer-review

Copy link to clipboard


  • 22105832_006_01_s012_text

    2.57 MB, PDF document

We have documented language varieties (either Turkic or Indo-European) spoken in 23 test sites by 88 informants belonging to the major ethnic groups of Kyrgyzstan, Tajikistan and Uzbekistan (Karakalpaks, Kazakhs, Kyrgyz, Tajiks, Uzbeks, Yagnobis). The recorded linguistic material concerns 176 words of the extended Swadesh list and will be made publically available with the publication of
this paper.

Phonological diversity is measured by the Levenshtein distance and displayed as a consensus bootstrap tree and as multidimensional scaling plots. Linguistic contact is measured as the number of borrowings, from one linguistic family into the other, according to a precision/recall analysis further validated by expert judgment.

Concerning Turkic languages, the results of our sample do not support regarding Kazakh and Karakalpak as distinct languages and indicate the existence of several distinct Karakalpak varieties. Kyrgyz and Uzbek, on the other hand, appear quite homogeneous. Among the Indo-Iranian languages, the distinction between Tajik and Yagnobi varieties is very clear-cut.

More generally, the degree of borrowing is higher than average where language families are in contact in one of the many sorts of situations characterizing Central Asia: frequent bilingualism, shifting political boundaries, ethnic groups living outside the “mother” country.
Original languageEnglish
Pages (from-to)57-98
Number of pages43
JournalLanguage Dynamics and Change
Issue number1
Publication statusPublished - 1-Jun-2016


  • language contact, dialectology, loan words

ID: 37205542