16S rRNA sequencing of samples from universal stool bank donors

Objectives Universal stool banks provide stool to physicians for use in treating recurrent Clostridioides difficile infection via fecal microbiota transplantation. Stool donors providing the material are rigorously screened for diseases and disorders with a potential microbiome etiology, and they are likely healthier than the controls in most microbiome datasets. 16S rRNA sequencing was performed on samples from a selection of stool donors at a large stool bank, OpenBiome, to characterize their gut microbial community and to compare samples across different timepoints and sequencing runs. Data description 16S rRNA sequencing was performed on 200 samples derived from 170 unique stool donations from 86 unique donors. Samples were sequenced on 11 different sequencing runs. We are making this data available because rigorously screened, likely very healthy stool donors may be useful for characterizing and understanding microbial community differences across different populations and will help shed light into the how the microbiome community promotes health and disease.


Objective
Universal stool banks provide rigorously-screened stool to physicians treating patients with recurrent Clostridioides difficile infection using fecal microbiota transplantation under US Food and Drug Administration enforcement discretion [1,2] as well as for research purposes. These stool banks provide centralized donor screening and material preparation, which increases the quality and accessibility of fecal microbiota transplantation as a therapy. Rigorous screening of donors is required to prevent transmission of pathogens or other microbiome-mediated diseases from the donor to the recipient.
The dataset described below is sourced from stool donors from a large, non-profit stool bank (OpenBiome, Cambridge, MA). The bank uses a rigorous screening process [1] that includes (i) an online pre-screen survey where candidates are excluded based on common criteria including body mass index, logistic constraints, and recent antimicrobial use; (ii) an in-person clinical assessment and interview where candidates are excluded for reasons like medication use, infectious disease risk factors, and potentially microbiome-mediated indications such as psychiatric illness; and (iii) a battery of laboratory tests to confirm health status. This results in an average of 3% of candidates accepted as donors [3]. This dataset will complement and extend previouslypublished sequencing from a subset of the bank's donors [4,5]. This dataset will be important for understanding how microbiome communities vary across different populations and contribute to health and disease. We are making it available for use by the scientific community for use on its own or as a healthy control comparison population in studies of disease. This dataset consists of 200 samples that have been characterized using 16S rRNA sequencing. These samples come from 170 unique donations from 86 individual donors and were sequenced on 11 sequencing runs. Donations from 48 donors were sequenced more than once. Some of these samples have been included as replicates on the same or on different sequencing runs. 11 donations from 9 donors were sequenced more than once on the same run, and 15 donations from 10 donors were sequenced more than once on different runs.

BMC Research Notes
The samples were sequenced by the University of Michigan DNA Sequencing Core on an Illumina MiSeq. The resulting fastq files (Data set 1) were processed using Qiime 2 (version 2020.8) [8] to create an OTU (operational taxonomic unit) table (Data File 3). Briefly, forward and reverse reads were demultiplexed, joined (using vsearch join-pairs with default settings), quality filtered (using quality-filter q-score with default parameters), and denoised using Deblur (using deblur denoise-16S with a trim length of 253 bp and minimum requirement of 1 read per sequence) [9]. Taxonomies were assigned to unique sequences using a naïve Bayesian classifier [10] trained on the 99% OTUs in the Greengenes database (version 13_8, using feature-classifier classifysklearn) [11][12][13]. Beta diversity was computed using the Jensen-Shannon divergence (using diversity beta with 1 pseudocount). Data File 4 is a metadata file describing these samples. 3 samples did not have any denoised reads and were discarded from downstream analysis.
To confirm that the community composition of each donor remains consistent between sequencing runs, we examined the beta diversity between samples from the same donor but different runs, from the same run but different donors, and from the same donor and run. Samples from the same donor but different runs were more similar to one another relative to samples from different donors sequenced on the same run (medians of 0.608 vs. 0.612, p = 0.03, Mann-Whitney U test; Data file 5). Furthermore, donors explained more of the observed beta diversity than sequencing runs (R 2 0.72 vs. 0.02, PER-MANOVA by marginal effects; Data file 6), confirming that donor microbiota composition remains stable over time and the biological and technical replicates in this dataset.

Limitations
Although this is a unique and high-quality dataset, no comparator samples from other populations were sequenced along with these samples, so we cannot compare the bank's stool donor population with the healthy community in general or with any specific disease state. Furthermore, only a subset of samples was sequenced multiple times; a more robust dataset would have additional biological and technical replicates.