Using metagenomic analyses to estimate the consequences of enrichment bias for pathogen detection
© Pettengill et al.; licensee BioMed Central Ltd. 2012
Received: 22 March 2012
Accepted: 10 July 2012
Published: 27 July 2012
Enriching environmental samples to increase the probability of detection has been standard practice throughout the history of microbiology. However, by its very nature, the process of enrichment creates a biased sample that may have unintended consequences for surveillance or resolving a pathogenic outbreak. With the advent of next-generation sequencing and metagenomic approaches, the possibility now exists to quantify enrichment bias at an unprecedented taxonomic breadth.
We investigated differences in taxonomic profiles of three enriched and unenriched tomato phyllosphere samples taken from three different tomato fields (n = 18). 16S rRNA gene meteganomes were created for each of the 18 samples using 454/Roche’s pyrosequencing platform, resulting in a total of 165,259 sequences. Significantly different taxonomic profiles and abundances at a number of taxonomic levels were observed between the two treatments. Although as many as 28 putative Salmonella sequences were detected in enriched samples, there was no significant difference in the abundance of Salmonella between enriched and unenriched treatments.
Our results illustrate that the process of enriching greatly alters the taxonomic profile of an environmental sample beyond that of the target organism. We also found evidence suggesting that enrichment may not increase the probability of detecting a target. In conclusion, our results further emphasize the need to develop metagenomics as a validated culture independent method for pathogen detection.
KeywordsEnrichment bias Metagenomics Pathogen Taxonomy
Enrichment procedures are often used to increase the probability of detection of a particular pathogen. However, due to numerous factors including competition, differences in relative growth rates, growth inhibitors, and presence of bacteriophages,  enrichment results in a biased sample [i.e., enrichment bias; . Given the objective of enriching, this bias is expected. However, our limited understanding of non-target effects that occur through the enrichment process make it difficult to rely on specific enrichment methods as best diagnostic approaches. Enrichment may have inherent and currently poorly understood consequences on resident microflora of a particular food or environmental sample, which may prove detrimental to the resolution of public health disease outbreaks.
Historically, quantifying enrichment bias was accomplished by determining the differences in relative abundance of a target organism (e.g., Salmonella spp., Shigella spp., Listeria ssp., or Escherichia spp.) among different enrichment treatments [1, 3, 4]. With the advent of next-generation sequencing and metagenomic methods, we can now describe the differences in incidence and potential abundance beyond the target organisms to that of nearly all bacterial lineages within a sample; our ability to do so will continue to improve with the increasing sequencing depth provided by next-generation sequencing methods and continually expanding reference databases. Metagenomic approaches also provide insight to ecological and functional dynamics associated with environments that host human pathogens, which in turn may increase our predictive ability to identify where a specific pathogen may arise. Although metagenomics has been used extensively to describe microbial communities, its utility for quantifying enrichment bias for public safety investigative purposes has yet to be fully explored .
In this study, we employed next-generation sequencing and a 16S rRNA metagenomic approach to evaluate the ways in which enrichment changes the taxonomic profile of a sample. We also investigate the effects of such a practice on the ability to detect the specific organism targeted by the enrichment procedure (i.e., the first step in the Bacterial Analysis Management (BAM) protocol for detection of Salmonella employed by the United States Food and Drug Administration (USFDA)). Differences in taxonomic profiles were characterized among 18 samples comprising 3 replicates each of enriched through universal pre-enrichment broth (UPB) and non-enriched tomato phyllosphere samples from three different sites surrounding Immokalee, Florida, USA, which is an area to which outbreak causing strains of Salmonella have been traced. We acknowledge that other culture independent methods exist for pathogen detection [e.g., quantitative PCR; [6, 7], however, they are not well suited to quantifying enrichment bias and are not evaluated here.
Sampling locations, the number of cpDNA sequences, the number of sequences, Chao’s index, and two estimates of the number of Salmonella sequences for each of the 18 replicates
26 27′ 42″ N
081 26′ 16″ W
26 22′ 05″ N
081 15′ 59″ W
26 17′ 12″ N
081 20′ 21 W
Focusing on the Enterobacteriaceae, which includes Salmonella, we found evidence for significant differences (p < 0.05) in the abundance of five genera between the cultured and uncultured treatments (Enterobacter, Klebsiella, Escherichia, Citrobacter, and Cronobacter).
Our results, which are among the first quantifying enrichment bias using a metagenomic approach, illustrate that the procedure of enriching a sample results in a drastically different taxonomic profile beyond that of the abundance of the target organism. This may not be of concern when there is certainty regarding the cause of an outbreak but if the organism responsible is unknown [so called ‘orphan’ microbes associated with diseases of previously unknown cause;  then our results suggest that enriching could greatly hinder the ability to identify those involved. For example, our results illustrate that enrichment using UPB significantly decreased the number of Actinobacteria, which is a taxonomic group that contains a number of human pathogens (e.g., Tropheryma whipplei). If a member of that group were responsible for an outbreak then our results suggest that the use of UPB may confound our ability to identify the causative agent.
Sampling and enrichment
Tomato samples were collected in May of 2011 from three different locations surrounding Immokalee, Florida, USA (Table 1). All samples were kept separate and brought back to the laboratory for processing. Below we briefly describe the enrichment protocol. For more detailed instructions see http://www.fda.gov/Food/ScienceResearch/LaboratoryMethods/BacteriologicalAnalyticalManualBAM/ucm070149.htm#Isol. Universal Pre-enrichment Broth (UPB) was added to samples of tomatoes at approximately 1.0 times the weight of the tomatoes, which was then incubated for 60 min at room temperature before being incubated at 35°C for 24 h.
DNA extraction and PCR amplification
DNA from uncultured samples was extracted from a wash of tomatoes and leaves. The resulting wash was sonicated for 5 min before centrifugation to generate a pellet from which DNA was extracted. DNA from cultured samples was extracted from approximately 1 ml of overnight culture that was also spun down to create a pellet. Total DNA was extracted using the Promega Wizard DNA Purification Kit according to the manufacturer’s specifications.
16S fragments (V1–V3) were amplified for Roche pyrosequencing (454) using Roche Fusion Primer A, key, and MIDs (Multiplex identifiers) 27 through 44, and 27F: 5′ CGT ATC GCC TCC CTC GCG CCA TCAG (10 base pair MID) AGA GTT TGA TCC TGG CTC AG 3′ and Roche Fusion Primer B, key, no mid and 533R: 5′ CTA TGC GCC TTG CCA GCC CGC TCAG TTA CCG CGG CTG CTG GCA C 3′. Removal of PCR amplicons under 300 bases was performed using AmpPure XP from Agencourt at a ratio of 60 μl of AMPure beads to 100 μl PCR product. We used the above primers because 1) they created amplicons of suitable length for sequencing on the 454 machine, 2) have been used in previous studies detecting Salmonella [e.g., , and 3) have been validated in our lab where they were used to successfully amplify pure cultures of Salmonella ssp. Newport.
It is important to acknowledge that our results are based on analysis of 16S ribosomal DNA sequences obtained via PCR dependent methods, which can be considered an enrichment process that introduces its own potential biases. However, we have assumed that whatever bias may have been introduced through targeted sequencing was equal between the treatments and, thus, did not affect our conclusions regarding the effects on taxonomic profiles between the enriched and unenriched treatments. We also acknowledge that extraction procedures may represent another source of bias that can affect the taxonomic profile of a sample [e.g., . Additional studies are necessary to determine whether extraction bias would affect the two treatments, enriched and unenriched, differently.
Emulsion PCR and sequencing
Amplicons were diluted to 107 molecules per μl and pooled to generate a mixture containing an equimolar representation of each independent replicate for subsequent emulsion PCR. Emulsion PCR was done using the Roche Lib-A MV kit according to the manufacturer’s specifications.
Approximately 800,000 enriched beads were loaded into one-quarter region of the Roche Titanium FLX pico-titer plate for sequencing on the Titanium FLX platform according to the manufacturer’s specifications. Sequencing read numbers were parsed in house with an adapted script to include MIDs beyond the 14 MIDs that Roche software automatically recognizes. Chimeric and chloroplast sequences were removed using the program ClovR . Specifically, 12,913 were detected as being chimeric; 5,501 sequences were identified as being chloroplast DNA by the RDP classifier. Interestingly cultured samples had fewer sequences identified as chloroplast compared to uncultured replicates, however, this result was not significant (t = −2.0793, df = 8, p = 0.0712).
Sequences were uploaded into MG-RAST v3.1.2 , where they are also publicly available (Table 1). All analyses within MG-RAST were conducted using the following parameter settings: the RDP annotation source, maximum e-value = 1.0−5, minimum identity cutoff = 98%, minimum alignment length cutoff = 150 bp. We constructed rarefaction plots to estimate the limits of detection of our sequencing efforts (i.e., how well we were able to detect the taxonomic diversity within each sample).
As a first step to identify whether the cultured and uncultured replicates had different taxonomic profiles, we conducted a principal coordinates analysis (PCoA) on the normalized abundance counts of taxa within each replicate using the Bray-Curtis dissimilarity index. We also estimated Chao’s alpha diversity metric for each replicate using QIIME v1.4.0 . Significance testing of the normalized abundances determined by MG-RAST and Chao’s diversity index were conducted grouping samples into two treatments (each with 9 replicates) and using Welch’s two-sample t-test as implemented in the stats package in R . Using MG-RAST, we also identified the groups at different taxonomic levels that were responsible for the observed differences based on the PCoA. This was accomplished by comparing normalized abundances of a given taxonomic group between the different treatments with significance testing again done using a t-test.
Given our emphasis on the probability of detecting Salmonella, we determined the number of samples within each replicate that were identified as such using two different platforms. The first was MG-RAST within which the number of putative Salmonella sequences was determined based on the best-hit classification and lowest common ancestor approaches. The second platform we used was NBC (naïve Bayes classifier) that assigns sequences to species through a Bayesian framework with all bacterial genomes within NCBI serving as the reference database . Because of the limited taxonomic breadth of the database used by NBC, we then used BLASTN  and the ‘nr’ database to further evaluate the taxonomic assignment of putative Salmonella sequences from the NBC analyses.
We are grateful to Cong Li for assistance with the sequencing of the samples. Funding for this research was provided by the U. S. Food and Drug Administration, the Army Research Office and the Oak Ridge Institute for Science and Education fellowship awarded to JBP.
- Muniesa M, Blanch AR, Lucena F, Jofre J: Bacteriophages may bias outcome of bacterial enrichment cultures. Appl Environ Microbiol. 2005, 71 (8): 4269-4275. 10.1128/AEM.71.8.4269-4275.2005.PubMedPubMed CentralView ArticleGoogle Scholar
- Dunbar J, White S, Forney L: Genetic diversity through the looking glass: effect of enrichment bias. Appl Environ Microbiol. 1997, 63 (4): 1326-1331.PubMedPubMed CentralGoogle Scholar
- Singer RS, Mayer AE, Hanson TE, Isaacson RE: Do microbial interactions and cultivation media decrease the accuracy of Salmonella surveillance systems and outbreak investigations?. J Food Protect. 2009, 72 (4): 707-713.Google Scholar
- Davies PR, Turkson PK, Funk JA, Nichols MA, Ladely SR, Fedorka-Cray PJ: Comparison of methods for isolating Salmonella bacteria from faeces of naturally infected pigs. J Appl Microbiol. 2000, 89 (1): 169-177. 10.1046/j.1365-2672.2000.01101.x.PubMedView ArticleGoogle Scholar
- Nakamura S, Maeda N, Miron IM, Yoh M, Izutsu K, Kataoka C, Honda T, Yasunaga T, Nakaya T, Kawai J, et al.: Metagenomic diagnosis of bacterial infections. Emerg Infect Dis. 2008, 14 (11): 1784-1786. 10.3201/eid1411.080589.PubMedPubMed CentralView ArticleGoogle Scholar
- Malorny B, Paccassoni E, Fach P, Bunge C, Martin A, Helmuth R: Diagnostic real-time PCR for detection of Salmonella in food. Appl Environ Microbiol. 2004, 70 (12): 7046-7052. 10.1128/AEM.70.12.7046-7052.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Hadjinicolaou AV, Demetriou VL, Emmanuel MA, Kakoyiannis CK, Kostrikis LG: Molecular beacon-based real-time PCR detection of primary isolates of Salmonella typhimurium and Salmonella enteritidis in environmental and clinical samples. BMC Microbiol. 2009, 9: 97-10.1186/1471-2180-9-97.PubMedPubMed CentralView ArticleGoogle Scholar
- Mortimer PP: Five postulates for resolving outbreaks of infectious disease. J Med Microbiol. 2003, 52 (Pt 6): 447-451.PubMedView ArticleGoogle Scholar
- Jacobson AP, Gill VS, Irvin KA, Wang H, Hammack TS: Evaluation of methods to prepare samples of leafy green vegetables for preenrichment with the bacteriological analytical manual Salmonella culture method. J Food Protect. 2012, 75 (2): 400-404. 10.4315/0362-028X.JFP-11-196.View ArticleGoogle Scholar
- Weisburg WG, Barns SM, Pelletier DA, Lane DJ: 16S ribosomal DNA amplification for phylogenetic study. J Bacteriol. 1991, 173 (2): 697-703.PubMedPubMed CentralGoogle Scholar
- Carrigg C, Rice O, Kavanagh S, Collins G, O’Flaherty V: DNA extraction method affects microbial community profiles from soils and sediment. Appl Microbiol Biotechnol. 2007, 77 (4): 955-964. 10.1007/s00253-007-1219-y.PubMedView ArticleGoogle Scholar
- Angiuoli SV, Matalka M, Gussman G, Galens K, Vangala M, Riley DR, Arze C, White JR, White O, Fricke WF: CloVR: a virtual machine for automated and portable sequence analysis from the desktop using cloud computing. BMC Bioinforma. 2011, 12: 356-10.1186/1471-2105-12-356.View ArticleGoogle Scholar
- Glass EM, Wilkening J, Wilke A, Antonopoulos D, Meyer F: Using the metagenomics RAST server (MG-RAST) for analyzing shotgun metagenomes. Cold Spring Harb Protoc. 2010, 2010 (1): pdb prot5368-PubMedView ArticleGoogle Scholar
- Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, Costello EK, Fierer N, Pena AG, Goodrich JK, Gordon JI, et al: QIIME allows analysis of high-throughput community sequencing data. Nat Method. 2010, 7 (5): 335-336. 10.1038/nmeth.f.303.View ArticleGoogle Scholar
- R: a language and environment for statistical computing. R Foundation for Statistical Computing. 2008, R Foundation for Statistical Computing, Vienna, AustriaGoogle Scholar
- Rosen GL, Reichenberger ER, Rosenfeld AM: NBC: the naïve classification tool webserver for taxonomic classification of metagenomic reads. Bioinformatics. 2011, 27: 127-129. 10.1093/bioinformatics/btq619.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215 (3): 403-410.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.