Skip to ContentSkip to Navigation
University of Groningenfounded in 1614  -  top 100 university
Research Graduate School for the Humanities

Mind the metrics: CAF measure reliability and the implications for L2 studies

PhD ceremony:Y. (May) WuWhen:June 05, 2025 Start:12:45Supervisor:prof. dr. W.M. (Wander) LowieCo-supervisor:R.G.A. (Rasmus) Steinkrauss, DrWhere:Academy building RUG / Student Information & AdministrationFaculty:Arts
Mind the metrics: CAF measure reliability and the implications for
L2 studies

This dissertation investigates the reliability of a number of Complexity, Accuracy, and Fluency (CAF) measures commonly used in assessing second language (L2) performance, applying Generalizability (G) Theory. It further researches how the reliability of such measures impacts L2 development research based on time-series data. In total, the project evaluates five writing and 57 speaking CAF measures in controlled settings to determine their variability. Findings reveal considerable differences in reliability across measures: while some fluency measures are highly reliable, most CAF measures are unreliable. When using low-reliability measures for assessing L2 development, they introduce substantial variability unrelated to actual language development or to interventions, posing challenges for Second Language Development (SLD) research that depends on CAF scores to track learner progress over time. 

To address these reliability issues, the dissertation suggests collecting multiple samples per data point, particularly when employing low-reliability measures.This could help to capture both any instability of an L2 system and developmental changes. In addition, this dissertation explores causes of the varying reliability levels such as the specificity of the targeted linguistic features. It also discusses potential ways for improving CAF measure reliability such as modifying the calculation methods of measures.

Ultimately, this work provides a reliability reference guide for selecting CAF measures in SLD research, stresses the need for improving CAF reliability, and encourages expanding reliability studies to other languages, measures, and learner populations. It also calls for development of new technological tools and further validity investigations to enhance the assessment and comparability of CAF measures across research contexts.

View this page in: Nederlands