Skip to ContentSkip to Navigation
Research Open Science Open Research Award

Open Tools and Educational Resources for Microbiome Data Science

Sudarshan Shetty, Department Medical Microbiology and Infection Prevention, UMCG; Leo Lahti, Turku Data Science Group, University of Turku, Turku, Finland

Open Research objectives

Making the outputs of research, including publications, data, software and other research materials freely accessible.


High-throughput sequencing technologies have transformed microbiome research into a data driven field. To empower and train biologists, we, a microbiologist (myself) and a biostatistician (Leo) decided to join forces to develop open tools and educational resources for analyzing the multi-faceted microbiome data [1-4]. A continuous cycle of development and end-user feedback that followed has led to the creation of a universe of R based tools that are serving microbiome community.

The microbiome R package is freely available as an open source, citable tool as a part of the Bioconductor ecosystem [1]. It has widely used by the microbiome community (>190 citations, Google Scholar). The tools have facilitated investigation of diverse microbial ecosystems ranging from human microbiome to plant rhizosphere microbiome. The microbiome R package was used by researchers to investigate the impact of the gut microbiota in very early life [5], unravelling the gut microbiome composition in colorectal cancer patients [6] and to understand the importance of Mexican wild cotton genotypes on the rhizosphere microbiome [7].

Several open-source tools are available for microbiome research but sometimes difficult to find on the web. For convenience of the researchers, we maintain a curated list of R-based tools and tutorials [3]. We use social media (e.g., Twitter) to reach out and actively engage with the research community. Analysis codes from my own research for bacterial comparative proteomics is available online as an open and reproducible tutorial on this weblink. In this tutorial, I give details of additional analysis that were not included in the main research article but I thought would be helpful for the wider research community.


Our tools are based on the widely used R statistical language and codes for tools and resources are hosted on GitHub. Hosting of codes openly on GitHub allows us to collaborate and engage the community in active development plus allows users to provide specific feedback that can be discussed and implemented in out tools. The supporting utilities tool and educational resources are also openly available which has been instrumental in reaching a diverse set of researchers across the globe [2-4]. As a part of our aim to support skill development, we actively organize hands-on training via international workshops and make the source codes freely and openly available as online books [4]. Workshops allows us to interact with end-users and understand their needs for making our tools user-friendly.

Lessons learned

Initially, many inexperienced R users reported difficulties in installation of R tools and to overcome this we created an automated script which facilitated installation of not only our tools but also a wide array other open-source R packages for microbiome analysis. Once out tool was made available in the Bioconductor repository, the installation process was further simplified. We also made efforts to make comprehensive tutorials of the functionality provided by our tools with focus on inexperienced users.

By using open practices and providing resources for microbiome research community we were able to build our scientific network. Researchers have developed new open-source tools that use the microbiome R package internally for common data analytics [8]. Our tools and resources are accessible to anyone in the world as evident by citations of our tools.

To encourage open science, open-source tools in isolation is not sufficient. There needs to be a concerted effort focused towards improving hands-on experience for researchers. We put almost the same if not more effort on creating open tutorials, resources and workshops to facilitate the adoption of open practices. By continuously engaging with the research community, we have built an open-source ecosystem of tools and resources for microbiome data analytics that has helped the microbiome research community.


  1. Lahti, L. and S.A. Shetty, Tools for microbiome analysis in R. 2018.
  2. Shetty, S. and L. Lahti, microbiomeutilities: An R package for utilities to guide in-depth marker gene amplicon data analysis (Version 0.99.00). 2018.
  3. Shetty, S.A. and L. Lahti. Tools Microbiome Anlaysis: A list of R environment based tools. 2018-Present; Available from:
  4. Shetty, S.A., et al. OPEN & REPRODUCIBLE MICROBIOME DATA ANALYSIS SPRING SCHOOL. 2018-Present; Available from:
  5. Shao, Y., et al., Stunted microbiota and opportunistic pathogen colonization in caesarean-section birth. Nature, 2019. 574(7776): p. 117-121.
  6. Sarhadi, V., et al., Gut microbiota and host gene mutations in colorectal cancer patients and controls of Iranian and Finnish origin. Anticancer research, 2020. 40(3): p. 1325-1334.
  7. Hernández-Terán, A., et al., Host genotype explains rhizospheric microbial community composition: the case of wild cotton metapopulations (Gossypium hirsutum L.) in Mexico. FEMS Microbiology Ecology, 2020. 96(8): p. fiaa109.
  8. Lin, H. and S.D. Peddada, Analysis of compositions of microbiomes with bias correction. Nature communications, 2020. 11(1): p. 1-11.


Last modified:16 March 2022 11.23 a.m.