Research Software Metadata

By publishing research software, our main concern is to make it findable by as many people as possible. This can be achieved by providing correct metadata.

Software metadata should at least describe where to find a specific version of the software, how to cite it, who the authors are, what are the inputs and outputs, and what are the dependencies.

Research software needs to be identified unambiguously when looking for it using common search strategies. Such strategies include the use of keywords in general-purpose search engines like Google, but also in specialized registries (eScience Research Software Directory) and repositories (e.g. GitHub, Zenodo).

Findability can be improved by registering the software in a relevant registry, along with metadata that provides contextual information about the software. Registries typically render metadata in a web-findable way and can provide a DOI. Some registries and repositories allow annotating software using domain-agnostic or domain-specific controlled vocabularies, further increasing findability via search engines.

For example, Zenodo tries to extract relevant keywords from the submitted repository content and also enables the user to create keywords. If there is a citation file ( CITATION.cff ) in the root folder of the GitHub repository, Zenodo tries to parse it to create metadata. This is why it is very important to have a complete README and a citation file to provide relevant keywords that will increase the findability of the research software. On the other hand, a separate .zenodo.json file can be created to explicitly tell Zenodo to use its content as metadata.

Last modified:13 December 2022 11.48 a.m.