Skip to main content

16S rRNA sequencing of samples from universal stool bank donors



Universal stool banks provide stool to physicians for use in treating recurrent Clostridioides difficile infection via fecal microbiota transplantation. Stool donors providing the material are rigorously screened for diseases and disorders with a potential microbiome etiology, and they are likely healthier than the controls in most microbiome datasets. 16S rRNA sequencing was performed on samples from a selection of stool donors at a large stool bank, OpenBiome, to characterize their gut microbial community and to compare samples across different timepoints and sequencing runs.

Data description

16S rRNA sequencing was performed on 200 samples derived from 170 unique stool donations from 86 unique donors. Samples were sequenced on 11 different sequencing runs. We are making this data available because rigorously screened, likely very healthy stool donors may be useful for characterizing and understanding microbial community differences across different populations and will help shed light into the how the microbiome community promotes health and disease.


Universal stool banks provide rigorously-screened stool to physicians treating patients with recurrent Clostridioides difficile infection using fecal microbiota transplantation under US Food and Drug Administration enforcement discretion [1, 2] as well as for research purposes. These stool banks provide centralized donor screening and material preparation, which increases the quality and accessibility of fecal microbiota transplantation as a therapy. Rigorous screening of donors is required to prevent transmission of pathogens or other microbiome-mediated diseases from the donor to the recipient.

Table 1 Overview of data files/datasets

The dataset described below is sourced from stool donors from a large, non-profit stool bank (OpenBiome, Cambridge, MA). The bank uses a rigorous screening process [1] that includes (i) an online pre-screen survey where candidates are excluded based on common criteria including body mass index, logistic constraints, and recent antimicrobial use; (ii) an in-person clinical assessment and interview where candidates are excluded for reasons like medication use, infectious disease risk factors, and potentially microbiome-mediated indications such as psychiatric illness; and (iii) a battery of laboratory tests to confirm health status. This results in an average of 3% of candidates accepted as donors [3].

This dataset will complement and extend previously-published sequencing from a subset of the bank’s donors [4, 5]. This dataset will be important for understanding how microbiome communities vary across different populations and contribute to health and disease. We are making it available for use by the scientific community for use on its own or as a healthy control comparison population in studies of disease.

Data description

As a result of the extensive screening, this population is healthier compared to other sequenced healthy populations like the Human Microbiome Project or the American Gut Project [6, 7]. The criteria used by these large projects describe different portions of the healthy population but do not screen out as many participants as universal stool banks. A full comparison of these criteria is included in Data File 1. The 86 stool donors that have provided these samples are 71% male and 29% female. Their average age is 27.7, and their average body mass index is 23.1. A full table of available donor health data is in Data File 2.

This dataset consists of 200 samples that have been characterized using 16S rRNA sequencing. These samples come from 170 unique donations from 86 individual donors and were sequenced on 11 sequencing runs. Donations from 48 donors were sequenced more than once. Some of these samples have been included as replicates on the same or on different sequencing runs. 11 donations from 9 donors were sequenced more than once on the same run, and 15 donations from 10 donors were sequenced more than once on different runs.

The samples were sequenced by the University of Michigan DNA Sequencing Core on an Illumina MiSeq. The resulting fastq files (Data set 1) were processed using Qiime 2 (version 2020.8) [8] to create an OTU (operational taxonomic unit) table (Data File 3). Briefly, forward and reverse reads were demultiplexed, joined (using vsearch join-pairs with default settings), quality filtered (using quality-filter q-score with default parameters), and denoised using Deblur (using deblur denoise-16S with a trim length of 253 bp and minimum requirement of 1 read per sequence) [9]. Taxonomies were assigned to unique sequences using a naïve Bayesian classifier [10] trained on the 99% OTUs in the Greengenes database (version 13_8, using feature-classifier classify-sklearn) [11,12,13]. Beta diversity was computed using the Jensen-Shannon divergence (using diversity beta with 1 pseudocount). Data File 4 is a metadata file describing these samples. 3 samples did not have any denoised reads and were discarded from downstream analysis.

To confirm that the community composition of each donor remains consistent between sequencing runs, we examined the beta diversity between samples from the same donor but different runs, from the same run but different donors, and from the same donor and run. Samples from the same donor but different runs were more similar to one another relative to samples from different donors sequenced on the same run (medians of 0.608 vs. 0.612, p = 0.03, Mann–Whitney U test; Data file 5). Furthermore, donors explained more of the observed beta diversity than sequencing runs (R2 0.72 vs. 0.02, PERMANOVA by marginal effects; Data file 6), confirming that donor microbiota composition remains stable over time and the biological and technical replicates in this dataset.


Although this is a unique and high-quality dataset, no comparator samples from other populations were sequenced along with these samples, so we cannot compare the bank’s stool donor population with the healthy community in general or with any specific disease state. Furthermore, only a subset of samples was sequenced multiple times; a more robust dataset would have additional biological and technical replicates.

Availability of data and materials

The data described in this Data Note can be freely and openly accessed on the European Nucleotide Archive under accession [14] and on the Zenodo repository under [15]. See Table 1 for details and links to the data.



Operational taxonomic unit


  1. Chen J, Zaman A, Ramakrishna B, Olesen SW. Stool banking for fecal microbiota transplantation: methods and operations at a large stool bank. medRxiv. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Quality & Safety. OpenBiome. Accessed 15 Sep 2020.

  3. Kassam Z, Dubois N, Ramakrishna B, Ling K, Qazi T, Smith M, et al. Donor screening for fecal microbiota transplantation. N Engl J Med. 2019;381:2070–2.

    Article  Google Scholar 

  4. Poyet M, Groussin M, Gibbons SM, Avila-Pacheco J, Jiang X, Kearney SM, et al. A library of human gut bacterial isolates paired with longitudinal multiomics data enables mechanistic microbiome research. Nat Med. 2019;25:1442–52.

    Article  CAS  Google Scholar 

  5. Santiago M, Eysenbach L, Allegretti J, Aroniadis O, Brandt LJ, Fischer M, et al. Microbiome predictors of dysbiosis and VRE decolonization in patients with recurrent C. difficile infections in a multi-center retrospective study. AIMS Microbiol. 2019;5:1–18.

    Article  CAS  Google Scholar 

  6. Huttenhower C, Gevers D, Knight R, Abubucker S, Badger JH, Chinwalla AT, Creasy HH, Earl AM, FitzGerald MG, Fulton RS, Giglio MG. Structure, function and diversity of the healthy human microbiome. Nature. 2012;486:207–14.

    Article  CAS  Google Scholar 

  7. McDonald D, Hyde E, Debelius JW, Morton JT, Gonzalez A, Ackermann G, et al. American gut: an open platform for citizen science microbiome research. mSystems. 2018.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Bolyen E, Rideout JR, Dillon MR, Bokulich NA, Abnet CC, Al-Ghalith GA, et al. Reproducible, interactive, scalable and extensible microbiome data science using QIIME 2. Nat Biotechnol. 2019;37:852–7.

    Article  CAS  Google Scholar 

  9. Amir A, McDonald D, Navas-Molina JA, Kopylova E, Morton JT, Xu ZZ, et al. Deblur rapidly resolves single-nucleotide community sequence patterns. mSystems. 2017.

    Article  PubMed  PubMed Central  Google Scholar 

  10. Wang Q, Garrity GM, Tiedje JM, Cole JR. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261–7.

    Article  CAS  Google Scholar 

  11. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, Keller K, et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol. 2006;72:5069–72.

    Article  CAS  Google Scholar 

  12. Bokulich NA, Kaehler BD, Rideout JR, Dillon M, Bolyen E, Knight R, et al. Optimizing taxonomic classification of marker-gene amplicon sequences with QIIME 2’s q2-feature-classifier plugin. Microbiome. 2018;6:90.

    Article  Google Scholar 

  13. Nicholas Bokulich, Mike Robeson, Ben Kaehler, Matthew Dillon. bokulich-lab/RESCRIPt: 2020.6.1. Zenodo; 2020.

  14. Olesen S. openbiome/donors-16 s v1.0. Zenodo. 2020.

  15. European Nucleotide Archive. 2020.

Download references


The OpenBiome team for collecting and sequencing these samples. Jonathan Watson for organizing the data.


This study was funded by OpenBiome.

Author information

Authors and Affiliations



MS and SWO conceived of the manuscript. SWO processed the data and created the initial OTU table. MS further processed the OTU table. MS drafted the manuscript. MS and SWO edited the manuscript. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Scott W. Olesen.

Ethics declarations

Ethics approval and consent to participate

Samples were collected from donors enrolled in the OpenBiome donor program. The donor program operates under the New England IRB (reference number 120160907). Written informed consent was obtained from participants. The study was submitted to and approved by OpenBiome’s Research Review Panel.

Consent for publication

Written informed consent was obtained from participants.

Competing interests

MS and SWO are employed as consultants by OpenBiome. MS has shares in Finch Therapeutics, Inc.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Santiago, M., Olesen, S.W. 16S rRNA sequencing of samples from universal stool bank donors. BMC Res Notes 14, 108 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: