News coverage of the coronavirus pandemic has once again underlined the fact that in recent years we have increasingly started to live in our own filter bubble. Since algorithms on social media determine what we see in our newsfeed and traditional media are also putting a greater emphasis on personalization, people only get to see information that confirms their view of the world, which in turn could lead to more polarization in society. Tommaso Caselli, Assistant Professor of Computational Semantics at UG, and the Data Science team of the Centre for Information Technology (CIT) want to break our filter bubbles using... an algorithm.
Author: Jorn Lelong
When we talk about filter bubbles, we automatically think of social media. Here, algorithms determine which posts we get to see, based on the pages we, or our friends, like. This way Facebook, Instagram and Twitter create a personalized bubble for each of us, containing news and information that fits our profile.
But this personalization is not limited to social media. For example, Blendle recommends articles based on interests or previously read articles. In recent years, traditional media have also been experimenting more and more with personalized newsletters, for example, in an attempt to retain readers. What consequences does that have for the way we read about the world? It is precisely this question that assistant professor Tommaso Caselli of the University of Groningen wants to try to answer with his project ‘Breaking filter bubbles’. A unique feature of the project is that it is an interdisciplinary collaboration: not only is Marcel Broersma, Professor of Media and Journalistic Culture, involved in the project, but Caselli is also being assisted by the CIT’s Data Science team.
As an Assistant Professor of Computational Linguistics, Caselli has been developing algorithms to analyze linguistic phenomena for some time now. ‘In a previous project as a postdoc at the VU Amsterdam I examined how ten different sources told the same story. I wanted to expand on that project with news reports, but on a large scale.’ And you can take that literally. In this project, data scientist Dimitrios Soudis of the CIT can sink his teeth into a New York Times archive spanning two decades. To make it a little more manageable, for the time being they are limiting themselves to news articles about natural disasters. ‘These are relatively simple stories,’ says Caselli. ‘Later on, we also want to look at crime and political reporting, but first we need to get a clear picture of how news stories work.’ It would be impossible to retrieve all the reports about natural disasters from the enormous New York Times archive manually. So to do that, Soudis is using his expertise in Artificial Intelligence (AI). ‘We go to the NY Times website and search for articles tagged with the category “earthquakes” for example.’ We download those articles, and use a mathematical model to calculate which articles in our corpus correspond with them.’
That’s the article collection part of the project. But according to Soudis, the real challenge of the project lies in the fact that language is not an exact science. ‘Algorithms still have a hard time understanding language. Just think of how often words like whirlwind, tsunami or landslide are used figuratively. So we have to filter them out.’ Computers are not yet able to understand language at a deeper level, so as a data scientist you have to be creative. Dimitrios Soudis came up with the idea of using frequencies. ‘We are creating a kind of dictionary containing terms related to natural disasters. Then we look at how often certain words appear in the selected articles and, more importantly, we look at the words with which they are used in a sentence. This allows us to unravel syntactic relationships between words.’
This method allows them to look at news reports in a new way. Instead of looking at how an event is reported in individual articles, Caselli and Soudis are investigating the underlying patterns that appear in all news reports about natural disasters. ‘Regardless of a journalist’s individual style, news reports are written according to certain templates. When reporting on an earthquake, for example, you often mention the size of the earthquake, its location and the number of victims, or how the emergency services responded. That’s what we’re trying to reconstruct.’
According to Caselli, once that model is up and running, it’s just a matter of expanding the corpus. ‘You have to start small, but eventually we want to see how other media report on the same event. Which sources do they mention, which information do they provide first or possibly leave out. We can then examine the differences between tabloid and traditional media in our research, as well as social media.’
However, Caselli believes we will never completely break down these filter bubbles. ‘People will keep using and visiting the media they trust. So those filter bubbles are also of your own making. But you have to give people the opportunity to get the complete picture and make their own decisions. So we want to provide an overview: these are the facts, and these are the different perspectives in the media. A Google News 2.0, if you will.’
In the series 'Highlights from our electronic collection' we would like to present four titles on the subject of Language & Literature.
Among the oldest books in the University of Groningen Library are the incunabula , the ‘first generation’ of printed books (printed before 1501). These imposing Bibles, carefully printed classical texts and humble devotional works have a long...
Vera Heininga is the Open Science coordinator and future programme leader of the upcoming Open Science programme of the University of Groningen. Together with her colleagues, she created the Open Science Community Groningen (OSCG). She explains...
The UG website uses functional and anonymous analytics cookies. Please answer the question of whether or not you want to accept other cookies (such as tracking cookies).
If no choice is made, only basic cookies will be stored. More information