Exploring Machine Learning to Study the Long-Term Transformation of News: Digital newspaper archives, journalism history, and algorithmic transparency

Broersma, M. & Harbers, F., 2018, In : Digital Journalism. 6, 9, p. 1150-1164 15 p.

Research output: Contribution to journalArticleAcademicpeer-review

The labour-intensive nature of manual content analysis and the problematic accessibility of source material make quantitative analyses of news content still scarce in journalism history. However, the digitization of newspaper archives now allows for innovative digital methods for systematic longitudinal research beyond the scope of incidental case studies. We argue that supervised machine learning offers promising approaches to analyse abundant source material, ground analyses in big data, and map the structural transformation of journalistic discourse longitudinally. By automatically analysing form and style conventions, that reflect underlying professional norms and practices, the structure of news coverage can be studied more closely. However, automatically classifying latent and period-specific coding categories is highly complex. The structure of digital newspaper archives (e.g. segmentation, OCR) complicates this even more, while machine learning algorithms are often a black box. This paper shows how making classification processes transparent enables journalism scholars to employ these computational methods in a reliable and valid way. We illustrate this by focusing on the issues we encountered with automatically classifying news genres, an illuminating but particularly complex coding category. Ultimately, such an approach could foster a revision of journalism history, particularly the often hypothesized but understudied shift from opinion-based to fact-centred reporting.
Original languageEnglish
Pages (from-to)1150-1164
Number of pages15
JournalDigital Journalism
Issue number9
Early online date11-Oct-2018
Publication statusPublished - 2018


  • Journalism history, Machine learning, (automatic) content analysis, digital newspaper archives, digitization, news genres, algorithmic transparency, BIG DATA, TEXT

Download statistics

No data available

ID: 65958971