Infrequent forms: noise or not?

Wieling, M. & Montemagni, S., 2016, The Future of Dialects: selected papers from Methods in Dialectology XV. Côté, M-H., Knooihuizen, R. & Nerbonne, J. (eds.). Language Science Press, p. 215-224 10 p.

Research output: Chapter in Book/Report/Conference proceedingChapterAcademic

In this study we ask the question whether simplifying the data in dialectometrical
studies by removing infrequent forms is advantageous to uncover the geographical
structure in dialect data. By investigating lexical variation in a large corpus of
Tuscan dialect data via hierarchical bipartite spectral graph partitioning, we are
able to identify the main geographical areas together with their linguistic basis. In
order to assess the influence of infrequent forms, we conduct two analyses: one
which includes only lexical variants used by at least 0.5% of the informants, and
another which includes all lexical variants in the data. Using this approach we show
that using all data enables us to find a geographical characterization with a more
adequate linguistic basis than by using the trimmed data.
Original languageEnglish
Title of host publicationThe Future of Dialects
Subtitle of host publicationselected papers from Methods in Dialectology XV
EditorsMarie-Hélène Côté, Remco Knooihuizen, John Nerbonne
PublisherLanguage Science Press
Number of pages10
ISBN (Print)9783946234197
Publication statusPublished - 2016

ID: 23254453