University of Groningenfounded in 1614  -  top 100 university

# Bachelor Projects

## Bachelor Projects Academic Year 2023-2024

#### Overview

(Note that some projects admit more than one student.)

##### Description:

Graphs are the backbone of computer science, and they are everywhere! They are used to model a wide variety of real-world problems, including social networks, transportation networks, and communication networks. Graphs have many parameters that describe their complexity and structure, and these parameters can be used to gain insights into the underlying problem. The simplest such parameters are the number of vertices and the number of edges, but there are more involved parameters like treewidth. This parameter describes how significantly a graph is different from a tree (more details can be found here https://en.wikipedia.org/wiki/Treewidth). There are many other parameters like pathwidth, cliquewidth, twinwidth, vertex cover, neighborhood diversity, and so on.  All these parameters can be used to gain a deeper understanding of how hard a particular problem is on a given graph.

The goal of the proposed project is to survey and study relations between different parameters describing graphs. For example, it is well known that the pathwidth parameter is never smaller than a treewidth. It is expected that this kind of relations will be surveyed or discovered in the project. Also you will be exploring different graph parameters from the viewpoint of algorithms and discrete mathematics. The project is an exciting opportunity to dive deep into the world of graphs and algorithms.

If you want to know more about treewidth you can watch the following video https://www.youtube.com/watch?v=kEnDGTwSDXY (understanding all material in the video is NOT necessary)

This project is intended for students interested in graphs.

##### Description:

Subset sum, knapsack, and partition are well-known problems in computer science. These problems involve a list of numbers as input. If the input contains only integer numbers, these problems are weakly NP-complete and admit pseudo polynomial algorithms. However, if the input contains rational numbers given by the denominator and numerator, these problems become strongly NP-complete.

The goal of this project is to investigate whether other “number theoretical” NP-complete problems have similar behavior. One such problem is the problem of covering the input array by the smallest number of arithmetic progressions. It is known that this problem is weakly NP-complete if the input contains only integer numbers. The proof of this result and other details can be found in the paper “Covering a set with arithmetic progressions is NP-complete” .

For more information on this topic, you may find the following literature useful: 2.

This project is intended for students interested in algorithms, NP-completeness and introductory number theory.

##### Description:

Parameterized complexity is a promising approach to tackle computationally hard problems. So far, fixed-parameter tractable algorithms have been designed mainly for graph problems. The goal of this project is to study the possibility of the application of fixed-parameter tractable machinery to the construction of efficient algorithms for computationally hard problems from basic number theory.

The project starts with a study of basic notions and definitions of fixed-parameter tractable algorithms. The final goal of the project is to survey  NP-complete problems involving integer numbers that might have small parameters (some characteristics) in real-world instances. For example, in the CAP problem, an input consists of a set of integer numbers, and the goal is to cover all these numbers using the smallest number of arithmetic progressions that do not include other integers. It is natural to assume that, in real life, we are interested in answers only if the number of arithmetic progressions is small. Under such parameterization, CAP admits an efficient algorithm (the problem has a fixed-parameter tractable algorithm), even though the general case of CAP is NP-complete.

During the project, your goal is to identify problems from basic number theory that are interesting from the point of view of fixed-parameter tractable algorithms.

Literature:

1. Basics of parameterized algorithms: Chapters 1 and 3 in the book "Parameterized Algorithms", freely available here https://www.mimuw.edu.pl/~malcin/book/parameterized-algorithms.pdf

2. List of NP-complete problems https://en.wikipedia.org/wiki/List_of_NP-complete_problems

##### Description:

The Partial Minimum Satisfiability problem involves a CNF formula F, where some of the clauses are marked as hard clauses. The goal is to find a satisfying assignment that satisfies all hard clauses and the smallest number of other clauses. The length of the formula is defined as the number of literals in the formula. The objective of this project is to construct an algorithm for the Partial Minimum Satisfiability problem, with the running time measured in terms of the length of the formula or in terms of the overall number of variables and clauses. The algorithm will most likely be a branch-and-bound algorithm. You can find details about branching algorithms in Chapter 2 of the book “Exact Exponential Algorithms,” which is available for free at https://folk.uib.no/nmiff/BookEA/BookEA.pdf.

This project will introduce you to exact exponential algorithms and basic techniques for designing such algorithms. You will also have the opportunity to construct your own algorithm for the Partial Minimum Satisfiability problem. Good luck!

Literature:

[1] Introduction to the world of exact exponential algorithms. https://folk.uib.no/nmiff/BookEA/BookEA.pdf

[2] Description of the Partial Minimum Satisfiability problem and its applications. https://www.sciencedirect.com/science/article/pii/S0004370212000616

##### Description:

Suppose the vertices of a graph G are colored. We define a subset of vertices as a connected tropical set if it generates a connected graph that contains vertices of all colors from the original graph. A subset of vertices is a Minimal Connected Tropical Set if it is a Connected Tropical Set and it does not contain a smaller set that is also a Connected Tropical Set.

A graph is called an interval graph if we can assign intervals on the line to all vertices such that two vertices are adjacent if and only if corresponding intervals intersect.

The objective of this project is to investigate the number of minimal connected tropical sets in interval graphs. We aim to find the exact values of the number of minimal connected tropical sets in small interval graphs using computational resources. We also try to generalize our findings to interval graphs of arbitrary size by providing lower bounds or upper bounds on their value.

Below you can find literature that is relevant to this project:

1. Enumerating Minimal Tropical Connected Set. https://link.springer.com/chapter/10.1007/978-3-319-51963-0_17

2. Enumeration of Minimal Tropical Connected Sets. https://link.springer.com/chapter/10.1007/978-3-031-30448-4_10

##### Description:

This project is about implementing a type system for a programming language. A type system determines which programs in a programming language are valid, and which ones are not.  A type system for example can reject a program that adds a number and a string together, thus catching a type error. And more advanced type system can catch more errors. The process of enforcing such a typing disciplined is called type checking: given a program, annotated with types, a compiler checks whether to accept the program or reject it.

The natural question for implementors of compilers is then how to implement a type system, without requiring too much input from the user. That is, given a partially annotated program, can we infer the types? This is referred to as type inference.

One class of algorithms for implementing type systems are based on bidirectional type checking, which combines type checking and type inference into a straightforward algorithm.

The goal of this project is to study bidirectional type checking and implementing one of the algorithms in your favorite programming language.

If you enjoyed the courses on functional programming or the elective course on compiler construction, then this project might be for you!

##### Description:

The aim of this project is to build a formally verified correct software, and to familiarize yourself with the Coq system. Coq is a proof assistant and a programming language, which allows the user not only to write and execute functional programs, but also to prove properties about them.

In the first part of this programming project the student will first familiarize themselves with the basics of Coq. In the second part, the student will pick an algorithm or a data structure – for example, a queue, a look-up table, a binary tree, etc – and they will implement and verify it.

Required skills: Knowledge of functional programming and mathematical induction

Here are some more links that contextualize the project.

##### Description:

The aim of this project is to implement a concurrent data structure (queue, stack, etc) in multicore OCaml, learning new programming paradigms along the way.

OCaml is a strongly-typed functional programming language (with support for object-oriented and imperative programming), and the recent multicore extension provides an interesting framework for parallel and functional programming.

This project is intended for students interested in different programming paradigms. A basic familiarity with functional programming and concepts of parallel programming is required.

##### Description:

Process calculi are small formalisms intended as programming models; they express concurrency in the form of processes that exchange messages between them and execute simultaneously. There are multiple process calculi that are Turing complete, that is, they are expressive enough to correctly represent Turing machines.

Mainstream programming languages (such as C++ and Java) use types (and type systems) to rule out errors derived from nonsensical program fragments. In the same spirit, we can combine process calculi with type systems: this is useful to rule out processes containing incorrect or nonsensical behaviors (such as deadlocks). We then talk of typed process calculi, which are typically less expressive than untyped process calculi precisely because incorrect processes cannot be specified in them.

The focus of this project is to consider a specific class of process calculi whose associated type systems are very flexible. Our hypothesis is that those typed process calculi should be able to identify correct processes and, at the same time, be able to correctly represent Turing machines. Indeed, these typed process calculi can represent infinite-state systems, by resorting to flexible forms of recursion in which unlimited process duplication can be expressed.

See https://www.jperez.nl/teaching/projects for general orientation and introductory papers.

##### Description:

Process calculi are small formalisms intended as programming models; they express concurrency in the form of processes that exchange messages between them and execute simultaneously.

This project considers the pi-calculus, a well-known process calculus that has been implemented in Maude, an automated tool for specifying programming languages and executing programs written in them. Recently, we have implemented on top of Maude a tool for specifying and verifying processes that implement binary protocols (protocols between exactly two processes).

While useful, many communication programs implement multiparty protocols (protocols between two or more processes). The goal of this project is to extend our tool in Maude, in order to enable the specification and analysis of processes implementing multiparty protocols.

See https://www.jperez.nl/teaching/projects for general orientation and introductory papers.

##### Description:

Process calculi are small formalisms intended as programming models; they express concurrency in the form of multiple processes that execute simultaneously and exchange messages between them.

This project considers the pi-calculus, a well-known process calculus. The pi-calculus is a very expressive model of computation. In this project, the interest is in the ability of expressing complex distributed protocols, in particular the leader election problem. Roughly speaking, this is the situation in which n distributed processes, with identical implementations, are able to break this symmetry and select one of them as their leader for conducting a protocol.

A well-known result indicates the ability to correctly express the leader election problem is at the heart of the expressive power of the pi-calculus. This prior result holds for the untyped pi-calculus; the goal in this project is to explore whether this result can be adapted to a pi-calculus equipped with a type system. The answer is not obvious, because typed process calculi are typically less expressive than untyped process calculi.

See https://www.jperez.nl/teaching/projects for general orientation and introductory papers.

##### Description:

The notion of protocol is everywhere in Computer Science. A well-known example are the security protocols that ensure that distributed communications can take place securely.

This project concerns two other classes of protocols that are also relevant in practice:

• The first class of protocols, which we will call session protocols, describe a series of communication actions between different agents, as in an online transaction. The goal with session protocols is to ensure that agents complete their respective protocol without communication errors or getting stuck.

• The second class of protocols, which we will call gossip protocols, are similar but serve a different purpose: the goal is to disseminate a secret message, in such a way that all protocol participants eventually become aware of this private information.

Session and gossip protocols have been widely investigated in the literature, although separately and with different methods and techniques. On the one hand, gossip protocols have been studied using (dynamic) epistemic logics, and their associated model checking techniques for verification. On the other hand, session protocols have been studied using process calculi and type systems, some of which are based on logics.

The goal of this project is to survey existing approaches and explore whether the techniques used for analyzing session protocols can be used to analyze gossip protocols (and vice versa).

See https://www.jperez.nl/teaching/projects for general orientation and introductory papers.

For gossip protocols, the following course by Hans van Ditmarsch and Malvin Gattinger can be useful:

##### Description:

Process calculi are small formalisms intended as programming models; they express concurrency in the form of multiple processes that exchange messages between them and execute simultaneously. There exist multiple process calculi, with different features and expressivity. The psi-calculus, proposed by Bengtson et al, is intended to be a flexible framework, designed in such a way that it generalizes several other process calculi. Indeed, existing process calculi are as specific instances of the psi-calculus.

Just as mainstream programming languages use types (and type systems) to rule out errors derived from nonsensical program fragments, we can combine process calculi with type systems: this is useful to rule out processes containing incorrect or nonsensical behaviors (such as deadlocks). We then talk of typed process calculi, which are typically less expressive than untyped process calculi precisely because incorrect processes cannot be specified in them.

This project concerns the psi-calculus and type systems, and has two goals. The first goal is to survey the state of the art on the combination of the psi-calculus with type systems. The work of Huttel et al (referenced below) is a good starting point. The second goal is to explore possible extensions of existing type systems for the psi-calculus, considering important correctness properties such as deadlock-freedom.

See https://www.jperez.nl/teaching/projects for general orientation and introductory papers.

##### Description:

In real-world decision-making, multiple actors often have to make decisions on multiple issues at the same time. In the presence of multiple issues and multiple actors with varying preferences, there is rarely one outcome that is optimal for all actors. Moreover, preferences over outcomes for an issue A may depend on the outcomes of other issues B and C. For single actors, such conditional preferences have been formalized with conditional preference networks (CP-nets) [1]. Multi-actor extensions and other generalizations have been studied in [2,3].

The aim of this project is to create a web-based tool that will support real-world case studies in complex decision-making using multi-actor CP-nets. The tool should (1) provide an interface for obtaining the CP-nets of users (actors), and (2) compute a number of aggregation metrics that can help identify outcomes that optimize various aspects of the group’s preferences .

Re (1): In the worst case, a CP-net specification requires a number of queries that is exponential in the number of issues, but in most (real-world) cases, the specification is much smaller due to few dependencies. Therefore the user input process should consist of a number of steps that allow to minimize the number of queries to the user. Moreover, the CP-net specifications should be stored in a database, and allow for some basic visualization of CP-nets.

It is part of the project to identify a suitable programming language and the setup for the web tool.

Moreover, students can get access to descriptions of issues and CP-nets from a real-world case study.

[1] Boutilier C, Brafman RI, Domshlak C, Hoos HH, Poole D (2004) CP-nets: A tool for representing and reasoning with conditional ceteris paribus preference statements. J. Artif. Intell. Res. 21:135–191, URL https://doi.org/10.1613/jair.1234

[2] Rossi, F., Venable, K. B., and Walsh, T. (2004). mCP nets: Representing and reasoning with preferences of multiple agents. In AAAI, volume 4, pages 729–734.

[3] Haret, A., Novaro, A., & Grandi, U. (2018). Preference Aggregation with Incomplete CP-Nets. In 16th International Conference on Principles of Knowledge Representation and Reasoning (KR 2018) (pp. 308-317). AAAI Press, Palo Alto, California.

##### Description:

Let N be the set of natural numbers {0, 1, 2, .. }. Consider the binary relation <= on N^2 = { (x,y) | x \in N and y \in N } defined

(a1, b1) <= (a2, b2) iff a1 <= a2 and b1 <= b2

For example, (2, 2) <= (3, 3) but (2,2) and (3, 1) are incomparable under <=. It is easy to see that <= is reflexive and transitive i.e., it is a quasi-order.

A {bad sequence} is a sequence a0, a1, a2, .. of elements from N^2 with the property that there is no increasing pair i.e. there is no i,j with i < j such that ai <= aj. An example of a bad sequence is (2,2), (3,1), (1,1), (0,1). However, (2,2), (1,3),(3,2) is not a bad sequence since it contains an increasing pair (2,2) <= (3,2).

It is easy to see that a bad sequence on N is finite since it would have to be a strictly descending sequence (e.g. 7, 5, 3, 2, 1, 0). The existence of incomparable elements in N^2 under <= makes the situation more interesting. Nevertheless:

LEMMA. Every bad sequence on N^2 with respect to <= is finite.

(Can you prove it? What is an example of a set and ordering where infinite bad sequences exist?)

A quasi-ordering <= such that every bad sequence is finite is called a {well-quasi-ordering} (wqo).

What is the maximum length of a bad sequence on N^2? It does not exist, since arbitrarily large jumps can be made e.g., (2,2), (1, 100), (1, 99), .. and hence arbitrarily long bad sequences can be found. However, if we restrict the size of a jump, e.g. || a_{I+1} || <= 2.|| a_i || where || a || is the maximum value over the coordinates then it can be shown that a maximum length for bad sequences exists as a function of the starting element (can you prove it? can you compute its length under this doubling control?).

RELATIONSHIP TO TCS: Suppose you want to show that an algorithm terminates. In many cases, the successive configurations of the algorithm can be represented as a bad sequence of some wqo. In such cases, the finiteness of the bad sequence guarantees termination, and the length of the bad sequence provides an upper bound.

In 2011, Figueira et al. gave an elementary and self-contained proof of sharp upper bounds for controlled bad sequences over N^k. The aim of this project is to survey the application of wqos in theoretical computer science and to survey their upper bound argument.

This topic is suited to a student who is interested in the mathematical aspects of computer science. The starting reference is the following.

Diego Figueira, Santiago Figueira, Sylvain Schmitz, Philippe Schnoebelen: Ackermannian and Primitive-Recursive Bounds with Dickson's Lemma. ACM/IEEE Symposium on Logic in Computer Science LICS 2011: 269-278

##### Description:

Gödel's incompleteness theorems from the 1930s are a landmark result in mathematical logic. These theorems establish that there are true statements of Peano arithmetic that are not provable in Peano arithmetic. However, the statement witnessing this claim---in Gödel's second incompleteness theorem, it is the statement asserting the consistency of Peano arithmetic---is not one that would arise in ordinary mathematical investigation. This motivated the search for more natural witnesses of incompleteness. The Paris-Harrington theorem fulfils this wish. It states that the strengthened finite Ramsey theorem from combinatorics is not provable in Peano arithmetic.

Specifically, the strengthened finite Ramsey theorem states that:

For any positive integers n, k, m, such that m >= n, one can find N with the following property: if we color each of the n-element subsets of S = {1, 2, 3,..., N} with one of k colors, then we can find a subset Y of S with at least m elements, such that all n-element subsets of Y have the same color, and the number of elements of Y is at least the smallest element of Y.

Let us consider an example. Set n=1, k=3, m=100. We must find some N such that if we colour each of the elements of {1, ... N} with exactly one from {blue, red, green} then there is a subset Y with |Y|>=100 such that every element of Y receives the same colour. At first glance, N=298 seems to do the trick. After all, even if we coloured the first 3x99=297 numbers so that exactly 99 receive the same colour, number 298 will ensure that some colour is witnessed by some 100 numbers, so choose Y as the set with |Y| consisting of 100 numbers with the same colour. However, the condition "the number of elements of Y is at least the smallest element of Y" might not be satisfied with this N. After all, there is nothing preventing 1 \in Y. On the other hand, if we choose N=398 then the above argument tells us that there is a subset Y in {101, .., 398} such that |Y|=100 and every number in Y receives the same colour. Since the smallest element of Y is >= 101 we have satisfied the latter condition as well. (Is there a smaller choice for N that works?).

The minimum N that satisfies the strengthened finite Ramsey theorem is a computable function in n, k, m that grows extremely fast. Faster, it turns out, than any computable total function that is provably total in Peano arithmetic. This is the source of the Paris-Harrington theorem.

The first aim of this project is to survey this result using the book "An Introduction to Ramsey Theory" (Matthew Katz and Jan Reimann) as the primary reference. This will take the student on a survey of a number of beautiful topics in combinatorics and formal logic and more. The second aim is to create resources (slides, and the thesis itself) for a 8 week course on this topic.

Although the book does not assume any specialised background, mathematical maturity and an interest in formal logic will be very beneficial.

##### Description:

This project is about surveying the theory of boolean circuits (a non-uniform model of computation where a different algorithm may be employed for each input size) and its role in computational complexity

A boolean circuit is a mathematical abstraction of the logic circuits that are built from logic gates like AND, OR, NOT. There are many reasons for studying boolean circuits in the context of computational complexity.

Firstly, they provide a natural model for non-uniform computation because a different algorithm can be provided for each input size. In a little more detail, a circuit family is an infinite list (C0,C1,C2,..) of circuits such that Cn has n input variables and a single output variable. We say that the circuit family {accepts} the language L if w \in L iff Cn(w)=1 where |w|=n. Contrast this with the familiar (uniform) computational models where a single Turing machine (algorithm) is supplied that must solve the problem for inputs of all sizes. Secondly, they may provide a means of separating familiar complexity classes such as P and NP. Thirdly, boolean circuits are an elegant formalism.

Here are some basic observations about the theory. The most natural complexity class is P/poly which is defined as the set of languages that can be decided using circuit families such that the number of nodes in each Cn (its size) is bounded by a fixed polynomial in n. This class is large in the sense that it properly contains P and also undecidable languages. This means that if we can find a language in NP that is not in P/poly then P =/= NP. We are also interested in subclasses of P/poly that take into account not only the size of the circuits in the family but also their depth (the length of the longest path from input to output). It is clear that every boolean function can be represented by a boolean circuit. It can be shown that asymptotically in the number of variables, almost all boolean functions require a large boolean circuit. An interesting application of group theory can be found in Barrington's theorem which establishes a result about boolean circuits using the nonsolvability of the symmetric group S5.

The aim of this project is to write a survey on boolean circuits and its complexity classes presenting a curated version of the theory including the main results and proofs, and to make slides that could be used for an introductory course. The references below are provided as a starting point. The student is expected to discover the literature beyond this.

• Introduction to the Theory of Computation. Michael Sipser. 3rd Edition. Chapter 10.5

• Computational Complexity - A modern approach. Sanjeev Arora and Boaz Barak. Chapter 6.