What is research data?

Research data is any information that has been collected, observed, generated or created to validate original research findings. The type of data could be image, video, audio, data cube, text or table. Although usually digital, research data also includes non-digital formats such as laboratory notebooks. Analysis codes, scripts and software, simulations and experiments results, e-lab journals as well as transcripts, calibration and raw data are all considered as research data. Physical objects such as samples, collections, artifacts and devices are not considered as research data and fall outside of the scope of the research data management, however one should adhere with her/his research group’s storage policy of physical objects.

Research software: Research software includes all software code (including scripts, applications, models, tools, algorithms, etc.) that is being used to produce research results, but also the software that is the research output.

Raw data: Raw data is the first ever information collected or generated as part of research. Calibration data can also be considered raw data depending on the way it is generated. The raw data is the data that has not gone through any type of transformation or correction. The raw data is part of the research output.

For example, If a researcher is analyzing samples in the laboratory using certain detectors or instruments to get measurement of any type then the first created result at the end of the analysis by the instrument is the raw data. It is very important that the researcher keeps this data together with the version which is converted to a more familiar file format. Likewise, if the researcher works with simulations, the first created results without any transformation are the raw data.

Processed data: Processed data is the data that has gone through one or multiple transformations, calibrations and analysis steps. The method used to create the processed data is also data.

Metadata: Metadata is information about the data generated or collected during research. It contains information regarding data such as the instrument, detector, or software used to generate and/or analyze the data, date when it is generated or collected, person(s) who generated and/or collected it, any parameters involved in the creation and/or collection of data, objective of experiment/observation/simulation/test, format and size of the data file, etc.

The importance of adding metadata to research data and output is the fact that metadata makes the data searchable and findable in a database or in a storage environment. Each piece of information provided in metadata, as mentioned above, can be used as keywords in return when one needs to find and re-use the data.

Is my data 'sensitive data'?

If the data collected or generated during the research falls under one of the descriptions below then your research data is considered sensitive.

Research data containing personal identifying information, 'personal data' and special categories data as defined in European data protection legislation .
Commercially sensitive data, including data generated or used under a restrictive commercial research funding agreement.
Data relating to species of plants or animals where the release of data may adversely affect rare or endangered species.
Data likely to harm an individual or community or have a significant negative public impact if released.

Any type of research concerning experiment and observation on human subjects is required to be consulted with the ethics committee. Dutch national standards regarding this matter can be found here . The Faculty of Science and Engineering (FSE) cooperates with the Faculty of Arts when ethics are concerned. For more information contact a member of the ethics board or the DCC via dcc rug.nl . It is the responsibility of the researcher to ensure that all precautionary measures are taken before publishing research output that is the result of sensitive data. For more information please contact the DCC .

Data types, formats and file naming?

Depending on the methodology, tools and instruments used to collect and generate data, a researcher will end up with many data files with various formats and types.

It is important to know and decide beforehand the format and type of data in order to determine the correct software and programming language to work with them. Such a software, application or programming language could be for plotting purposes or applying any type of (statistical, image, etc.) analysis on them.

Factors to be considered are:

Data formats that can be recognised and processed by easily accessible softwares and programming languages today and in the future.
Discipline specific norms.
If software is compatible with the existing hardware.
If there is funding for new software.
Which formats will be the easiest to annotate with metadata so that the researcher and others can interpret them in the future.

However, it is often the case that, especially in experimental oriented research, data generated via commercial softwares and instruments has formats that are not compatible with mainstream open source applications. In such cases researchers need to make sure to convert these data to an easily readable format and store both versions.

Recommended formats for preservation are:

Textual data: XML, TXT, HTML, DOC, DOCX, ODT, PDF/A (Archival PDF), MD, XML, HTML, CSS
Tabular data (including spreadsheets): CSV, ODS, XLS, XLSX
Databases: XML, CSV, SQL, MDB, SIARD
Images: TIFF, PNG, JPEG, FITS, SVG
Audio: WAV, MP3
Video: MPG, MPEG, AVI, MKV

Best practices for file naming

Don’t use

Good examples

. - & % # * ! @ $ ^ ~ ‘ ` { } [ ] ? > <

Proj_v01.csv

samp_005.csv

spec_number.csv

YYYYMMDD.csv

Data size

It is essential that a researcher knows and can estimate both the individual data files and the total size of data during and at the end of the research project in order to determine the best storage, collaboration and data transfer options and tools.

For example, a researcher might only be dealing with a couple of data files each of them having sizes at the order of a couple of tens of GB, and the total data size could be less than 1 TB, which is not considered as big data. However, if each data file has a size of 15 to 50 GB then it might be a problem to upload them to a digital environment and share them. Another case is when each researcher in a group deals with data that amounts to more than 1TB which requires extra consideration for storage and data transfer options.

Last modified:17 March 2026 11.44 a.m.

What is research data?

Is my data 'sensitive data'?

Data types, formats and file naming?

Data size

Functional

Standard

Complete