Skip to ContentSkip to Navigation
University of Groningen Library
University of Groningen Library Open access
Header image Open Science Blog

Open Access Publication in the Spotlight (January) - 'Early Detection of violating Mobile Apps: A data-driven predictive model approach'

Date:23 January 2023
Author:Open Access Team
Open access publication in the spotlight: January 2023
Open access publication in the spotlight: January 2023

Each month, the open access team of the University of Groningen Library (UB) puts a recent open access article by UG authors in the spotlight. This publication is highlighted via social media and the library’s newsletter and website.

The article in the spotlight for the month of January 2023 is titled Early detection of violating Mobile Apps: A data-driven predictive model approach, written by Fadi Mohsen, Dimka Karastoyanova and George Azzopardi (all from the Information Systems department, Bernoulli Institute, Faculty of Science and Engineering).


Mobile app stores are the key distributors of mobile applications. They regularly apply vetting processes to the deployed apps. Yet, some of these vetting processes might be inadequate or applied late. The late removal of applications might have unpleasant consequences for developers and users alike. Thus, in this work, we propose a data-driven predictive approach that determines whether the respective app will be removed or accepted. It also indicates the features’ relevance that helps the stakeholders in the interpretation. In turn, our approach can support developers in improving their apps and users in downloading the ones that are less likely to be removed. We focus on the Google App store and we compile a new data set of 870,515 applications, 56% of which have been removed from the market. Our proposed approach is a bootstrap aggregating of multiple XGBoost machine learning classifiers. We propose two models: user-centered using 47 features, and developer-centered using 37 features, which are available before publishing an app. We achieve the following Areas Under the ROC Curves (AUCs) on the test set: user-centered 0.792, developer-centered 0.762.

We asked first and corresponding author Fadi Mohsen a few questions about the article:

What are the most common reasons why apps get removed from app stores?

It is very challenging if not impossible to enumerate all the reasons. However, an app generally gets removed because it is malicious or violates certain privacy conditions. If I have to pick the top reason for removing privacy-violating apps, it would be the failure to comply with COPPA (Children’s Online Privacy Protection Act)

You have made the underlying data for this article openly available through DataverseNL. Why did you decide to do this?

Our ultimate aim is to trigger further research in this field. Thus, making the data sets and the paper publicly available maximizes the chance of that happening. 

Which stakeholders do you think will benefit the most from the open availability of the dataset? 

The open access to our research outcomes (data sets and papers) shall allow more researchers to extend and improve our work. This consequently leads to building solutions that protect the privacy of mobile app users and inspire their confidence.

Could you reflect on your experiences with open access and open science in general?

Before joining the University of Groningen, I had never published any open access articles. Though, I was disseminating my research data sets and source codes via my personal website. I am happy about the opportunity and support I get from UG to publish my research open access including the data sets and source codes. I find the process to do so very smooth and well-managed.

Useful links:

DataverseNL is a publicly accessible data repository that is managed by the UG’s Digital Competence Centre. It allows you to deposit and share research data openly or under restricted access and helps making your research data FAIR (Findable, Accessible, Interoperable, Reusable).

The UG’s Digital Competence Centre can assist you with archiving and publishing your data in DataverseNL. For answers to the most common questions regarding DataverseNL, check out their DataverseNL FAQ.

This guide explains what you can to make your research data FAIR.


Mohsen, F., Karastoyanova, D., & Azzopardi, G. (2022). Early detection of violating Mobile Apps: A data-driven predictive model approach. Systems and Soft Computing, 4, 200045. 

If you would like us to highlight your open access publication here, please get in touch with us.

About the author

Open Access Team
The Open Access team of the University of Groningen Library

Link: /openaccess