Understanding software through automated classification: a taxonomic perspective
PhD ceremony: | Mr C. (Cezar) Sas |
When: | June 03, 2025 |
Start: | 11:00 |
Supervisors: | A. (Andrea) Capiluppi, Prof, P. (Paris) Avgeriou, Prof |
Where: | Academy building RUG |
Faculty: | Science and Engineering |

Open-source software has grown massively, with platforms like GitHub hosting millions of projects. While this explosion of code has accelerated innovation, it also makes it difficult to find, understand, and reuse existing software—especially for newcomers facing large and complex repositories.
In his thesis, Cezar Sas addresses these challenges by introducing a modern, developer-informed system for organizing and classifying software. Existing methods often rely on simple tags or README files, which are inconsistent and fail to reflect the way developers actually work. Instead, Sas proposes a hierarchical taxonomy built from real developer terminology and GitHub topics, offering a clearer and more structured way to describe software.
To ensure relevance and scalability, the system uses advanced tools like large language models and automated techniques to create and refine categories. A key innovation is the shift from classifying entire projects to supporting multiple levels of detail, such as files and packages, improving precision and uncovering topics often missed.
The result is a practical, scalable framework that enhances software discoverability and reuse. By aligning better with how developers understand and build software, this work helps bridge the gap between growing open-source ecosystems and the tools needed to navigate them effectively.