Reusing (finding) secondary data

Re-using existing data is a great way to save time and resources. It strengthens the transparency and reproducibility of scientific research, fosters collaboration across disciplines, and offers ethical benefits by reducing the burden on participants and environments. By choosing to build on existing datasets, you support open science and help accelerate scientific discovery.
Finding and reusing research data in 7 steps
1. Define your research question
Define your research question

Before starting with your research, it’s important to determine what question you want to answer. To do this, you may already be familiar with the topic or have read around the literature or conducted a literature review. Once you have your research question, knowing what type of data you need, including the depth, detail and quality, will be much clearer.
2. Determine your search terms
Determine your search terms

To find relevant data, you need to translate your research question into a set of search terms. First, select the key terms from your research question. Then, examine key papers in your research field and consult handbooks or dictionaries to add more terms. Include synonyms and different spelling variants wherever possible and consider the single/plural form of words.
3. Think about where to search for data
Think about where to search for data

1. Attached to publications
In many publications, authors provide a data availability statement which can link to a data repository where data are stored and managed, or a different location or supplementary materials where the data can be found. Sometimes, data is “available upon reasonable request”, and the authors need to be contacted to get access to the data. More and more, funders and journals make it required to make data accessible by publishing it, if possible.
2. In data repositories
A data repository is a storage system where research data is collected, managed, and shared. It allows researchers to archive, publish (open and restricted) and make datasets available for re-use. While at the DCC we manage and support publishing in DataverseNL, many data repositories can be topic-specific or general.
Some general data repositories are Zenodo, Open Science Framework, Dryad (life sciences) and DANS. A catalogue of repositories may also be found at Re3data. You may also want to look at more specific data repositories and whether they exist for your topic by, for example, googling your topic + data repository.
3. Using a (data-related) search engine
Similar to literature, data-related search engines help you find datasets based on keywords. Some examples of these are Web of Science (make sure to search in “Data citation index”), B2FIND, the EU data catalogue, Datacite Commons, Google Datasets and the EU open data portal. Note that often these data are not hosted on these websites, but metadata are scraped from different data hosting platforms and repositories and can be found through these data search engines.
4. Data journals
Data journals are journals that specifically focus on datasets and each “paper” describes a dataset. Some examples are Scientific Data (Nature), Data in Brief or Data Science Journal.
5. Licensed data resources
Some datasets are licensed, meaning they are governed by an organization and will require that you, your institution or workplace have access. Fortunately, many licenses for datasets have been purchased by the UG and are available free of charge to university students, staff or specific faculties.
How to find licensed datasets:
-
If you know the name of the dataset or its provider:
-
Start your search on the University of Groningen Library’s e-resources page.
-
-
If you’re exploring relevant datasets:
-
Consult a Library Guide in your discipline. Many guides include dedicated sections on (secondary) data, statistics, databases, and others.
These strategies will help you locate the licensed datasets you are looking for and provide information on procedures to gain access to the data.
-
4. Search for data
Search for data

To search for data, enter your search terms into the search box. Ideally, you combine the search terms into a search string using the Boolean operators AND, OR, NOT. To search for a specific concept or term (e.g. “social media”), put the words between double quotation marks. Include wildcard symbols such as *, # or ? to broaden your search and include variations in spelling.
Note that the possibilities for using Boolean operators and wildcard symbols differ per platform. Always consult the help pages for the data repository or search engine you are using.
5. Check if you have access and what the terms of use are
Check if you have access and what the terms of use are

When getting access to datasets, the process depends on how open the data is and what licence/terms of conditions are placed upon it. When a dataset is openly available and shared, it means that it can be used by anyone at no cost, although you may have to give attribution and there may be some usage conditions, such as with Creative Commons licences.
A dataset may also be restricted, meaning it has not been made openly available for multiple reasons, such as privacy and personal data, third-party rights, size and costs, to name a few. In this case, you may have to register, provide proof of affiliation or agree to a specific set of agreements before even downloading the data, even if it does not directly cost money.
As mentioned above, if a dataset is licensed, it is governed by an institution or organisation and will typically require payment, often through university subscriptions.
Make sure to check the terms of use of data and access conditions before proceeding to properly understand how you can use, share or publish the data!
6. Evaluate the quality of your dataset
Evaluate the quality of your dataset

Always evaluate the quality of your dataset by examining its relevance and reliability. Questions you could ask, include: Will this dataset help me answer my research question? Who published this dataset, and are they an authority on the subject? Has the quality of the publication been assessed? How comprehensive is this dataset? May the dataset contain biased information?
The so-called CRAAP test is a useful tool for evaluating information:
-
Currency - What is the timeliness of the data?
-
Relevance - How important is the data for your research needs?
-
Authority - What do you know about the source of the data?
-
Accuracy - How reliable, truthful and correct is the data?
-
Purpose - What is the reason this data exists?
7. Properly cite your dataset
Properly cite your dataset

Just like other sources, datasets need to be properly cited. How to do so can vary based on the bibliography standards you are using or are required to use. It is useful to note that many repositories provide a formatted citation that can be adapted to your desired bibliography standards.
The most important aspects to include are:
-
Authors or creators
-
Publication date (year)
-
Title of the dataset
-
Publisher or repository (where is the dataset hosted?)
-
Persistent identifier (e.g., DOI, or the URL)
Example:
Smith, J. 2025. Example dataset for citing. DataverseNL. DOI: 10.075934.
