NWO's Open Science Fund 2021 - an interview with dr. Martijn van Leusen (GIA/Faculty of Arts) about his project on FAIR open data for archaeological field surveys
|Date:||22 November 2021|
On 27 October 2021, NWO announced the results of the first round of the Open Science Fund, a funding instrument to stimulate and reward open science. In total 26 projects were awarded, including one project at the UG and one at the UMCG.
We spoke to Dr. Martijn van Leusen (Groningen Institute of Archaeology, Faculty of Arts) about his NWO-funded project and how it will contribute to advance open science in the field of landscape archaeology.
Project title: Using semantic modeling to create FAIR open data for archaeological field survey: a showcase and toolkit (SEMAFORA)
Main applicant: dr. P.M. van Leusen (GIA, Faculty of Arts)
Abstract: Field surveys have, since about 1970, been the main method by which archaeologists discover and record findspots and individual finds at the earth’s surface. Whilst for the Mediterranean area alone the documented finds already run in the millions, the lack of documentation standards effectively prevents researchers and heritage managers from conducting large-scale analyses. This project seeks to build and showcase a software toolkit that will allow them to share and query this fundamental and irreplaceable resource in a distributed, online form, taking advantage of existing work in so-called ‘semantic’ data modelling in the cultural heritage sector.
What is semantic data modeling (SDM)?
SDM means making the meaning of data explicit by describing its semantic structure. To reuse any type of dataset, a lot of implicit knowledge is required; to reuse other people’s data, this knowledge has to be made explicit by a semantic data description. If you want to make other people’s datasets queryable, you need software that understands the meaning and structure of the data and is able to pick up those bits that are necessary to answer the query.
For example, in my own research I map long stretches of landscape (primarily in Italy) to record all of the archaeology that is present on the surface. We typically record the 'size' of the archaeological sites discovered during surveys, but 'size' can be defined and measured in many different ways. So if we want to compare site sizes across different survey projects, the concept 'site size' must be fully defined for each of those projects.
What’s the main goal of this one year project?
Our key goal is to demonstrate to researchers in our discipline (non-invasive archaeological landscape studies) that semantically modelling archaeological survey data is possible and desirable. We aim to build a software toolkit to enable ‘mapping’ (see glossary below) of archaeological survey datasets by the data owners themselves.
In our project we will use the CIDOC CRM standard (see glossary below) for semantic modelling, which we will extend with the concepts needed to describe survey data, and from which the data owner then selects the concepts describing their own dataset. We provide both general and context-sensitive guidance to help them do this. Once the data owner has done that, they can be sure that what they call pottery type x in their database is exactly the same as what somebody else calls pottery type x in their database. This makes the databases queryable without actually standardizing their contents.
Our goal in this project will be to determine a minimum set of concepts, e.g. about categories like: pottery type, archaeological site type, and field visibility, but also about research activity types such as collecting and documentation procedures as well as the actors involved in these activities (i.e. team leader, team members, administrative personnel). We estimate we will require around 30 concepts to fully describe any archaeological survey dataset.
Once a dataset has been ‘mapped’ using our software, the dataset can be stored in a FAIR way in a repository. The advantage of our system is that it allows researchers to describe and archive their datasets according to a very high standard and make them futureproof, i.e. the datasets will remain understandable indefinitely.
If all archaeologists had already been describing their survey data in a standardized way, we wouldn't need this project. The basic problem is that there's no standard at the moment. That’s why we hope that more researchers will adopt this software. Once more datasets are described according to this system, they will form a super-dataset that researchers can use to ask new research questions. It becomes a new scientific resource. What is more, because there has been no such standard in the past, there are now hundreds if not thousands of 'legacy' datasets that have to be retroactively 'mapped' with our proposed system - a huge task but one that potentially brings great scientific and heritage management benefits. Although we would hope that new survey projects will adopt a higher documentation standard (preferably based on our semantic model), we are here mainly concerned with saving and valorising these legacy data.
Who's part of the project team?
The project team consists of two academic team members affiliated with the UG, Dr. Martijn van Leusen (Associate professor of Landscape Archaeology, main applicant) and Dr.Tymon de Haas (Assistant professor of Classical and Mediterranean Archaeology) and two commercial partners, Takin.solutions and Delving.eu.
The academic team members will supply the disciplinary expertise for the creation of the semantic model and the user manuals for the toolkit. They will also liaise with international colleagues to stimulate the wide adoption of the software toolkit in the discipline.
Takin.solutions is a SME (small to medium-sized enterprise) based in Bulgaria, led by Dr. George Bruseker and Dr. Denitsa Nenova who are experts in semantic modelling, the CIDOC CRM standard and its application to archaeological datasets. Takin.solutions will help build the semantic model and assist with the CIDOC CRM harmonization process.
Delving.eu is a small business based in The Hague, led by Sjoerd Siebinga. Delving will build the software tool, and host the FAIR data point demonstrator for 5 years following the project delivery.
Mapping = the data owner specifies their own data structure in terms of the semantic model (ontological model).
CIDOC CRM (Conceptual Reference Model) = provides an ontology for concepts and information in cultural heritage and museum data. It is the international standard for cultural heritage data modeling.