Antal van den Bosch - Example-based modeling of syntactic alternations
08 januari 2013
Based on corpus data such as childrens' and child-directed speech data from CHILDES, or any large digital corpora of written text, computational models can be trained to predict certain choices made during speech or writing. Memory-based models are a class of computational models that store examples of alternations, and use analogical or similarity-based reasoning over these stored examples to predict which choice is going to be made given a new, unseen input context. I will discuss experiments performed with memory-based models in two cases studies which are both work in progress.
First, in joint work with Joan Bresnan we train models on individual childrens' data as well as on other childrens' data and child-directed speech to predict new alternation choices in the English dative construction. Learning curve studies indicate that having child-directed speech in memory leads to better predictions than having only children's data, but if sufficient data is available for a single child, its own history of data points is also a good predictor of its next choices.
Second, in joint work with Stef Grondelaers and Dirk Speelman we model the complex distribution of Dutch existential 'er' (there) in Flemish and northern-Dutch locative inversion constructions. Our data show that using only lexical features produces prediction scores for the Northern-Dutch data which are on a par with previously tested regression models containing abstract linguistic features. The fact that the Flemish distribution of 'er' cannot be modelled exclusively on the basis of lexical input reveals deep-rooted differences between two language varieties which seem to be no further apart than British and American English.
Generalizing over these case studies I discuss issues in comparing lexical versus abstract linguistic features and issues in experimental regimens for testing computational models.
First, in joint work with Joan Bresnan we train models on individual childrens' data as well as on other childrens' data and child-directed speech to predict new alternation choices in the English dative construction. Learning curve studies indicate that having child-directed speech in memory leads to better predictions than having only children's data, but if sufficient data is available for a single child, its own history of data points is also a good predictor of its next choices.
Second, in joint work with Stef Grondelaers and Dirk Speelman we model the complex distribution of Dutch existential 'er' (there) in Flemish and northern-Dutch locative inversion constructions. Our data show that using only lexical features produces prediction scores for the Northern-Dutch data which are on a par with previously tested regression models containing abstract linguistic features. The fact that the Flemish distribution of 'er' cannot be modelled exclusively on the basis of lexical input reveals deep-rooted differences between two language varieties which seem to be no further apart than British and American English.
Generalizing over these case studies I discuss issues in comparing lexical versus abstract linguistic features and issues in experimental regimens for testing computational models.
Laatst gewijzigd: | 10 februari 2021 14:56 |
Meer nieuws
-
02 juli 2025
Relinde Weil herbenoemd als lid van Raad van Toezicht RUG
Het heeft de Minister behaagd Relinde Weil te benoemen voor een tweede termijn als lid van de Raad van Toezicht RUG.
-
01 juli 2025
Khalaf Alkhalaf Alumnus van het Jaar 2025
Khalaf Alkhalaf is verkozen tot Alumnus van het Jaar 2025 van de RUG. Hij krijgt de prijs voor zijn inzet voor een goede opvang en begeleiding van vluchtelingen in Nederland en de inspirerende manier waarop hij zijn ervaringen en kennis deelt met...