Antal van den Bosch - Example-based modeling of syntactic alternations
08 January 2013
Based on corpus data such as childrens' and child-directed speech data from CHILDES, or any large digital corpora of written text, computational models can be trained to predict certain choices made during speech or writing. Memory-based models are a class of computational models that store examples of alternations, and use analogical or similarity-based reasoning over these stored examples to predict which choice is going to be made given a new, unseen input context. I will discuss experiments performed with memory-based models in two cases studies which are both work in progress.
First, in joint work with Joan Bresnan we train models on individual childrens' data as well as on other childrens' data and child-directed speech to predict new alternation choices in the English dative construction. Learning curve studies indicate that having child-directed speech in memory leads to better predictions than having only children's data, but if sufficient data is available for a single child, its own history of data points is also a good predictor of its next choices.
Second, in joint work with Stef Grondelaers and Dirk Speelman we model the complex distribution of Dutch existential 'er' (there) in Flemish and northern-Dutch locative inversion constructions. Our data show that using only lexical features produces prediction scores for the Northern-Dutch data which are on a par with previously tested regression models containing abstract linguistic features. The fact that the Flemish distribution of 'er' cannot be modelled exclusively on the basis of lexical input reveals deep-rooted differences between two language varieties which seem to be no further apart than British and American English.
Generalizing over these case studies I discuss issues in comparing lexical versus abstract linguistic features and issues in experimental regimens for testing computational models.
First, in joint work with Joan Bresnan we train models on individual childrens' data as well as on other childrens' data and child-directed speech to predict new alternation choices in the English dative construction. Learning curve studies indicate that having child-directed speech in memory leads to better predictions than having only children's data, but if sufficient data is available for a single child, its own history of data points is also a good predictor of its next choices.
Second, in joint work with Stef Grondelaers and Dirk Speelman we model the complex distribution of Dutch existential 'er' (there) in Flemish and northern-Dutch locative inversion constructions. Our data show that using only lexical features produces prediction scores for the Northern-Dutch data which are on a par with previously tested regression models containing abstract linguistic features. The fact that the Flemish distribution of 'er' cannot be modelled exclusively on the basis of lexical input reveals deep-rooted differences between two language varieties which seem to be no further apart than British and American English.
Generalizing over these case studies I discuss issues in comparing lexical versus abstract linguistic features and issues in experimental regimens for testing computational models.
Last modified: | 10 February 2021 2.56 p.m. |
More news
-
24 March 2025
UG 28th in World's Most International Universities 2025 rankings
The University of Groningen has been ranked 28th in the World's Most International Universities 2025 by Times Higher Education. With this, the UG leaves behind institutions such as MIT and Harvard. The 28th place marks an increase of five places: in...
-
05 March 2025
Women in Science
The UG celebrates International Women’s Day with a special photo series: Women in Science.
-
16 December 2024
Jouke de Vries: ‘The University will have to be flexible’
2024 was a festive year for the University of Groningen. In this podcast, Jouke de Vries, the chair of the Executive Board, looks back.