Publication

Detecting Controversy in Dutch News

Groot, D. & Caselli, T., Jan-2019.

Research output: Contribution to conferencePosterAcademic

In this work, we investigate automatic controversy detection in Dutch news using a distant supervised approach based on entropy.
We collected a total of 1859 news articles from Facebook from five different Dutch news providers (NOS, RTL Nieuws, de Volkskrant, het Parool, NRC and de Telegraaf) together with their Facebook users’ reactions (LIKE, LOVE, HAHA, WOW, SAD and ANGRY). We used the reactions as proxies for controversies, assuming that the higher the entropy of the reactions, the more controversial is the news. A manual exploration the 10-top and 10-bottom news of the dataset ordered by entropy confirmed the validity of the intuition.
We then developed a linear regression model to predict the controversy of news based on token and character n-grams. As a baseline, we used a dummy regressor always predicting the average of the entropy. We investigate three experimental settings: i.) all-news, a 10-fold cross validated model on the full corpus; ii.) in-source, a 10-fold cross validated model on each news source separately; and iii.) across-source, where we trained on one news source and tested on the other 5 (e.g. train on NOS and test on het Parool). In all experimental settings, the model beat the baseline. In particular, in all-news the model MSE=0.033 (baseline MSE= 0.049); in in-source the average of the model MSE=0.036 (baseline MSE= 0.042); and in across-source the average of the model MSE=0.052 (baseline MSE= 0.59).

We extended the model to predict topics, the number of reactions and their type to form a complete pipeline.
Original languageEnglish
Publication statusPublished - Jan-2019
EventComputational Linguistics in the Netherlands 29 (CLIN29) - De Oosterpoort, Groningen, Netherlands
Duration: 31-Jan-201931-Jan-2019
Conference number: 29
http://www.let.rug.nl/clin29

Conference

ConferenceComputational Linguistics in the Netherlands 29 (CLIN29)
Abbreviated titleCLIN29
CountryNetherlands
CityGroningen
Period31/01/201931/01/2019
Internet address

Event

Computational Linguistics in the Netherlands 29 (CLIN29)

31/01/201931/01/2019

Groningen, Netherlands

Event: Conference

View graph of relations

ID: 112792451