Skip to ContentSkip to Navigation
Research Open Science Open Research Award

Winner 2021 - Repeatable, reproducible and open bioinformatics for biologists

Hannah Dugdale (FSE), Per Palsbøll (FSE), Sebastian Lequime (FSE), Jurjan van der Zee (FSE)

Open Research objectives

Make scientific research more transparent, repeatable, reproducible and freely accessible through the teaching of FAIR, open science practices, in particular, teaching the use of software:

  • To encourage repeatable and reproducible code
  • For version control
  • To archive code and data to make it citeable and freely available online.

Practices

Students were taught how to structure a bioinformatics project, organise code within their bioinformatics project, code in R and RStudio, use version control through Git/GitHub, archive their code/data through multiple platforms, make their code/data citeable (e.g. Zenodo), and make their code/data accessible on GitHub.

Introduction

The “Practical Bioinformatics for Biologists” WMBY008-05 course for MSc and PhD students was designed to teach students about open scientific research. Students were taught about the importance of a data management plan, the FAIR principles for scientific data management and stewardship, why science needs to be reproducible and open, incorporating open methods into research, and facilitating open science for peer-review and paper publication.

Repeatable, reproducible and open bioinformatics for biologists
Repeatable, reproducible and open bioinformatics for biologists

Motivation

There is a reproducibility crisis in science: 90% of scientists surveyed by Nature agreed with this (Baker 2016) and only 36% of psychology studies being replicable (Open Science Collaboration, 2015). This is highlighted by high-profile academic fraud cases driven by the publish or perish phenomenon. Repeatable research is beneficial to students when they come to add new data, return to a project after a break, have to respond to reviewer requests and for detecting errors. Reproducibility in research is the beneficial to collaborators, reviewers/editors, when handing over a project, and to other scientists in the future. Open science allows datasets and code to be made publically available in digital format with no or minimal restrictions, enabling repeatability, reproducibility, error detection, per-review, collaboration, credibility and reusability. Teaching students about repeatable, reproducible and open research, and providing them with the tools to do open research is vital for the progress of science.

Lessons learned

The course was taught online due to the pandemic, but sessions were recorded giving students chance to re-watch sessions, which they reported made the course material more accessible. Working through the practical examples while screen sharing helped the students learn and engage with the open science tools. An introduction to R and RStudio is challenging when there are students with differing levels of experience, but having the options of break-out groups with demonstrators on hand helped resolve questions from students with no previous experience of R.

URLs, references and further information

  • Baker M. 2016. Is there a reproducibility crisis? Nature 533(7604):452-454
  • Open Science Collaboration. 2015. Estimating the reproducibility of psychological science. Science, 349, 6251
Last modified:16 September 2022 08.30 a.m.