The effect of DNA extraction methodology on gut microbiota research applications
© The Author(s) 2016
Received: 12 May 2016
Accepted: 19 July 2016
Published: 26 July 2016
The effect that traditional and modern DNA extraction methods have on applications to study the role of gut microbiota in health and disease is a topic of current interest. Genomic DNA was extracted from three faecal samples and one probiotic capsule using three popular methods; chaotropic (CHAO) method, phenol/chloroform (PHEC) extraction, proprietary kit (QIAG). The performance of each of these methods on DNA yield and quality, microbiota composition using quantitative PCR, deep sequencing of the 16S rRNA gene, and sequencing analysis pipeline was evaluated.
The CHAO yielded the highest and the QIAG kit the lowest amount of double-stranded DNA, but the purity of isolated nucleic acids was better for the latter method. The CHAO method yielded a higher concentration of bacterial taxa per mass (g) of faeces. Sequencing coverage was higher in CHAO method but a higher proportion of the initial sequencing reads were retained for assignments to operational taxonomic unit (OTU) in the QIAG kit compared to the other methods. The QIAG kit appeared to have longer trimmed reads and shorter regions of worse quality than the other two methods. A distinct separation of α-diversity indices between different DNA extraction methods was not observed. When compositional dissimilarities between samples were explored, a strong separation was observed according to sample type. The effect of the extraction method was either marginal (Bray–Curtis distance) or none (unweighted Unifrac distance). Taxon membership and abundance in each sample was independent of the DNA extraction method used.
We have benchmarked several DNA extraction methods commonly used in gut microbiota research and their differences depended on the downstream applications intended for use. Caution should be paid when the intention is to pool and analyse samples or data from studies which have used different DNA extraction methods.
KeywordsMetagenomics DNA extraction Benchmarking Diversity PCR 16S rRNA gene
The introduction of molecular biology techniques and deep sequencing in microbiology has revolutionised our interest and advanced our understanding on the role of gut microbiota in health and disease. Isolation and purification of bacterial genomic DNA from gut mucosal and luminal contents is a crucial initial step to ensure a high yield and quality of isolated nucleic acids, and unbiased representation of microbial communities. Over the past decade, several proprietary DNA extraction kits have been developed and became commercially available with the intention to replace the more laborious, time consuming original approaches [1, 2]. There is good evidence showing that different DNA extraction kits will generate different results in terms of: amount and quality of extracted DNA, inhibitors of PCR reactions, and influences on bacterial community composition [1–5]. The effects that various DNA extraction methods may have on traditional downstream methods (e.g. quantitative PCR) compared to modern next generation sequencing approaches have not been extensively explored [1, 2]. Hereby, we have performed a benchmarking study and explored the effect of three popular faecal DNA extraction methods on yield and quality of isolated DNA, as well as microbiota composition determined by a typical sequencing analysis pipeline, using traditional molecular microbiology techniques and high-throughput next generation sequencing.
Effect on yield and purity of isolated DNA
Effect on quantitative estimates of major bacterial taxa
Independently of faecal sample or bacterial taxon explored, the same mass of faecal sample gave a higher concentration (per g of original faeces) of 16S rRNA amplicon copies with the CHAO method in qPCR analysis (Fig. 1b). For the probiotic capsule, the PHEC method produced the highest concentration of amplicon copies for all taxa identified.
Effect on microbial community determined through 16S rRNA amplicon sequencing
Effect on 16S rRNA amplicon sequencing based community composition
Summary of findings reported in this study
Highest DNA yield
Most fractional loss of reads after different steps of bioinformatics pipeline
High coverage in sequencing
Takes longest to perform
Lowest loss of read trimming
Phenol Chloroform (PHEC)
In the middle in terms of performance compared with the other methods
Highest dsDNA in probiotic sample
Use toxic reagents
Quicker than CHAO
QIAamp DNA stool minikit (QIAG)
Highest read yield post OTUs assignment
Lowest DNA yield
Highest quality DNA
Can be automated
From a bioinformatics analysis perspective, an important determinant of a suitable DNA method is the high quality sequencing yield. A method which produces a high number of reads per sample, and with a high percentage of reads mapping to OTUs following bioinformatics analysis is desirable. Such a method will be more useful for statistical analysis and will be more cost-effective by decreasing the number of repeated sequencing analysis of some samples (often a library size cut off is applied to filter out samples with low abundance count). In this context, the QIAG was superior over the two other approaches.
The main limitation of this study is the small number of samples which precluded formal statistical analysis for some outcome measures. However, even with this small number of samples the majority of the results were consistent and of the same direction. Moreover, the results of this study should be interpreted with relevance to faecal specimens and the performance of these methods in other matrices (e.g. soil or plants) needs to be explored.
As research in the area of gut microbiology is moving from small scale to large multicentre international studies, a choice of practical, quick and cost effective methods will be preferred. In this instance, the CHAO method takes up to 2 days to process 15 samples at an average cost of less than £1 per sample in consumables, whereas the PHEC extraction can be completed in less than 6 h, at the same average cost as the CHAO method, and the QIAG kit will only take 3 h at an approximate cost of £4 per sample for the same number of samples and can be incorporated within automated DNA extractors.
In conclusion, we have shown that there is no superior DNA method fitting all downstream approaches and the method of choice depends on the intended type of analysis, practicality and cost (Table 1). Nonetheless, we advocate towards the importance of using the same DNA extraction method when comparing group differences in a study as well as caution should be paid when the intention is to analyse biobanked samples or pool data from studies which have used different DNA extraction methods.
Faecal samples were obtained from a healthy adult, a healthy child, and a child with with Crohn’s disease who participated in ongoing research . A proprietary probiotic preparation containing 8 different strains of lactic-acid bacteria, VSL#3® (sigma-tau Ethifarma b.v. NL), was analysed too.
DNA extraction methods
Extraction of 200 mg stool was carried out in duplicate for each method (technical replicates) as described below. Three different DNA extraction protocols were used: (a) a modified commercial kit (QIAamp® DNA stool mini kit, Qiagen), (b) a phenol–chloroform extraction method and c) a method using a combination of chemical, enzymatic and physical steps.
QIAamp® DNA stool mini kit
The kit was applied according to the manufacturer’s instructions with modifications. These included destruction of bacterial cells with bead beating (Tungsten Carbide Beads 3 mm Cat No. 699997, QIAGEN) for 3 min at 4.5 m/s with a FastPrep-24 (MP biomedicals).
The phenol–chloroform extraction used is based on the protocol by Reichardt et al. . This involved mechanical lysis destruction of bacterial cells with zirconium beads (0.1 mm, Biospec products) in sterile PBS, saturated acid phenol and separation of nucleic acids with chloroform/isoamylalcohol (24:1), centrifugation and precipitation of nucleic acids with isopropanol occurred in the presence of 3 M sodium chloride.
A modified version of the protocol by Godon et al.  as described previously . Briefly faeces were suspended in a buffer containing a salt solution and incubated for 1 h at 70 °C. Sterile silica beads (0.1 mm, Biospec products) were used for bacterial cell lysis in a FastPrep-24 bead beater (MP biomedicals). Then 15 mg Polyvinylpyrrolidone was added and the suspension was centrifuged with 15.000×g at 4 °C for 3 min. The supernatant was recovered, the pellet was washed with 450 μL TENP buffer, centrifuged again and washed two more times. The pooled supernatants were centrifuged with 20,000×g at 4 °C for 10 min. Nucleic acids were precipitated with isopropanol. Following 10 min incubation at room temperature the mixture was centrifuged with 15,000×g at 4 °C for 5 min and the supernatant was discarded. The pellet was resuspended in 225 μL phosphate buffer 0.1 M (pH 8) and 25 μL potassium acetate 5 M and left at 4 °C overnight. 5 μL RNAse (10 mg/mL) was added and incubated at 37 °C for 45 min. DNA was precipitated using 50 μL 3 M sodium acetate and 1 mL ice cold 100 % ethanol. After incubation at −20 °C for 1 h the DNA pellet was washed three times with 70 % ethanol, dried and stored at −20 °C in TE buffer.
Yield and purity of isolated nucleic acids
Bacterial genomic double stranded DNA yield was measured with the Qubit ® fluorometer 2.0 using the high sensitivity assay kit (ThermoFisher, Q32851) and the purity of nucleic acids was assessed with at the absorbance ratio 260:280 (NanoDrop ® ND-1000).
Quantitative real time PCR
The concentration of 16s rRNA gene copies of major dominant and subdominant bacterial taxon groups (Clostridium leptum, Clostridium coccoides, Bifidobacterium genus, Lactobacillus, Escherichia coli, Entrerococcus) were measured in triplicate using quantitative real-time PCR analysis on a 7500 Real-Time PCR System (Applied Biosystems) using TaqMan Gene Expression and the same primers and probes as described previously . The concentration of 16S rRNA gene copy number for each sample was expressed per gram of dry faecal material taking into account any dilution factor in the concentration of template DNA in qPCR reaction. Non template controls were included in each run.
16S rRNA gene sequencing
16S rRNA gene sequencing was performed on the MiSeq (Illumina) platform using 2 × 250 bp paired-end reads. The V4 region was amplified using fusion Golay adaptors barcoded on the reverse strand as described previously . The forward 16S rRNA primer sequence 515f (GTGCCAGCMGCCGCGGTAA) was used. The reverse primers, barcodes and adaptors were identical to those described previously . Amplicons were purified with AMPure XP DNA purification beads (Beckman Coulter, Danvers, MA, USA) according to the manufacturer’s instructions, and eluted in 25 μl of proprietary elution buffer (Qiagen, 19086, UK). Amplicon concentration was quantified with use of KAPA SYBR® FAST qPCR Kit (Kapa biosystems, KK4824, UK), diluted to 40 pM and spiked with 40 pM of genomic DNA to avoid base-calling issues due to low base diversity. A negative extraction control was included for each method.
The paired end reads were trimmed and filtered using Sickle v1.200  by applying a sliding window approach and trimming regions where the average base quality drops below 20. After this, we applied a 10 bp minimum length threshold to discard all shorter reads. We then used pandaseq v(2.4)  with a minimum overlap of 50 bp to assemble the forward and reverse reads into a single sequence spanning the entire V4 region. After obtaining the consensus sequences from each sample, we used the UPARSE (v7.0.1001) pipeline (https://bitbucket.org/umerijaz/amplimock/src) for OTU construction. In brief, we pooled reads from different samples together and added barcodes to keep an account of the samples these reads originated from. We then dereplicated the reads and sorted them by decreasing abundance and discarded singletons. In the next step, the reads were clustered based on 97 % similarity discarding reads that were shorter than 32 bp. Even though the cluster_otu command in usearch removes reads that have chimeric models built from more abundant reads, a few chimeras may be missed, especially if they have parents that are absent from the reads or are present with very low abundance. Therefore, in the next step, we used a reference-based chimera filtering step using a gold database (http://drive5.com/uchime/uchime_download.html) that is derived from the ChimeraSlayer reference database in the Broad Microbiome Utilities (http://microbiomeutil.sourceforge.net/). The original barcoded reads were matched against clean OTUs with 97 % similarity (a proxy for species level separation) to generate a total of 335 OTUs comprising all samples. The representative OTUs were then taxonomically classified against the RDP database using the standalone RDPclassifier v2.6  with the default–minWords option of 5. For species level assignment, we have used NCBI Taxonomy and TAXAassign (https://github.com/umerijaz/TAXAassign). To find the phylogenetic distances between OTUs, we first multisequence aligned the OTUs against each other using mafft v7.040  and then used FastTree v2.1.7  on these alignments to generate an approximately-maximum-likelihood phylogenetic tree.
Statistical analyses were performed in R using the tables and data generated as above as well as the metadata associated with the study. For community analysis (including α and β diversity analyses) we have used the vegan  package in R. To obtain unweighted Unifrac distances (that account for phylogenetic relatedness and are calculated using the branch lengths from the phylogenetic tree of the OTUs observed in the samples, without considering their abundances), we have used the phyloseq  package. Non-metric distance scaling plot (NMDS) was applied using Vegan’s metaMDS() function to visualise natural clustering in the dataset. Additionally, we have used ape , phangorn  and BAT  packages together to calculate the recently proposed β-diversity estimators that consider phylogeny too. The BAT package proposes three ways of estimating phylogenetic diversity (PD), building on estimators originally developed for Taxon Diversity (TD): correcting PD values based on the completeness of TD; fitting asymptotic functions to accumulation curves of PD; and adapting nonparametric estimators to PD data. The only requirement is for the phylogenetic tree to be an ultrametric tree for which we used chronos() from R’s ape package to convert our OTU tree to an ultrametric tree (after rooting the tree by applying midpoint() rooting function from the R’s phangorn package). In Fig. 4c, d, the resulting beta diversity estimates from BAT packaged are plotted using R’s corrplot to represent the quantitative estimates and then ordered using R’s hclust(). The general scripts as well as tutorials for the above analyses are available at http://userweb.eng.gla.ac.uk/umer.ijaz#bioinformatics.
KG, UZI, wrote the manuscript, KG, MB, SC, NL processed the samples, UZI, KG analyzed the data, NL, CQ, EC, critically discussed the findings and gave feedback on the manuscript. KG co-ordinated the project, provided funding. All authors read and approved the final manuscript.
C. Quince is funded by an MRC fellowship as part of the CLIMB consortium Grant Ref: MR/L015080/1. U.Z. Ijaz is funded by a NERC fellowship NE/L011956/1. We would like to thank the anonymous reviewer 3 for his suggestion to summarise findings of the study within a Table.
The authors declare that they have no competing interests.
Availability of supporting data
The raw sequence files supporting the results of this article are available in the EBI repository, (deposited upon acceptance). The data are available on the European Nucleotide Archive under the study accession number: PRJEB14875 (http://www.ebi.ac.uk/ena/data/view/PRJEB14875).
Ethics approval and consent to participate
This study has received full ethical permission by the Yorkhill Research Ethics Committee (Reference Number: 05/S0707/66) and each participant offered written informed consent according to NHS Good Clinical Practice for Research.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Kennedy NA, Walker AW, Berry SH, et al. The impact of different DNA extraction kits and laboratories upon the assessment of human gut microbiota composition by 16S rRNA gene sequencing. PLoS ONE. 2014;9:e88982.View ArticlePubMedPubMed CentralGoogle Scholar
- Salonen A, Nikkila J, Jalanka-Tuovinen J, et al. Comparative analysis of fecal DNA extraction methods with phylogenetic microarray: effective recovery of bacterial and archaeal DNA using mechanical cell lysis. J Microbiol Methods. 2010;81:127–34.View ArticlePubMedGoogle Scholar
- de Boer R, Peters R, Gierveld S, et al. Improved detection of microbial DNA after bead-beating before DNA isolation. J Microbiol Methods. 2010;80:209–11.View ArticlePubMedGoogle Scholar
- McOrist AL, Jackson M, Bird AR. A comparison of five methods for extraction of bacterial DNA from human faecal samples. J Microbiol Methods. 2002;50:131–9.View ArticlePubMedGoogle Scholar
- Nechvatal JM, Ram JL, Basson MD, et al. Fecal collection, ambient preservation, and DNA extraction for PCR amplification of bacterial and human markers from human feces. J Microbiol Methods. 2008;72:124–32.View ArticlePubMedGoogle Scholar
- D’Amore R, Ijaz UZ, Schirmer M, et al. A comprehensive benchmarking study of protocols and sequencing platforms for 16S rRNA community profiling. BMC Genom. 2016;17:55.View ArticleGoogle Scholar
- Quince C, Ijaz UZ, Loman N, et al. Extensive modulation of the fecal metagenome in children with crohn’s disease during exclusive enteral nutrition. Am J Gastroenterol. 2015;110:1718–29.View ArticlePubMedPubMed CentralGoogle Scholar
- Reichardt N, Barclay AR, Weaver LT, et al. Use of stable isotopes to measure the metabolic activity of the human intestinal microbiota. Appl Environ Microbiol. 2011;77:8009–14.View ArticlePubMedPubMed CentralGoogle Scholar
- Godon JJ, Zumstein E, Dabert P, et al. Molecular microbial diversity of an anaerobic digestor as determined by small-subunit rDNA sequence analysis. Appl Environ Microbiol. 1997;63:2802–13.PubMedPubMed CentralGoogle Scholar
- Gerasimidis K, Bertz M, Hanske L, et al. Decline in presumptively protective gut bacterial species and metabolites are paradoxically associated with disease improvement in pediatric Crohn’s disease during enteral nutrition. Inflamm Bowel Dis. 2014;20:861–71.View ArticlePubMedGoogle Scholar
- Caporaso JG, Lauber CL, Walters WA, et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proc Natl Acad Sci USA. 2011;108(Suppl 1):4516–22.View ArticlePubMedGoogle Scholar
- Joshi NA, Fass JN. Sickle: A sliding-window, adaptive, quality-based trimming tool for fastq files. Version 1.21; 2011.Google Scholar
- Masella AP, Bartram AK, Truszkowski JM, et al. PANDAseq: paired-end assembler for illumina sequences. BMC Bioinformatics. 2012;13:31.View ArticlePubMedPubMed CentralGoogle Scholar
- Wang Q, Garrity GM, Tiedje JM, et al. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol. 2007;73:5261–7.View ArticlePubMedPubMed CentralGoogle Scholar
- Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Price MN, Dehal PS, Arkin AP. FastTree 2–approximately maximum-likelihood trees for large alignments. PLoS ONE. 2010;5:e9490.View ArticlePubMedPubMed CentralGoogle Scholar
- Oksanen J, Blanchet F G, Kindt R, Legendre P, Minchin PR, O’Hara RB, Simpson GL, Solymos P, Stevens MHH, Wagner H. vegan: Community Ecology Package, R Package version 2.2-1, version 2.2-1 ed; 2015.Google Scholar
- McMurdie PJ, Holmes S. phyloseq: an R package for reproducible interactive analysis and graphics of microbiome census data. PLoS ONE. 2013;8:e61217.View ArticlePubMedPubMed CentralGoogle Scholar
- Paradis E, Claude J, Strimmer K. APE: analyses of Phylogenetics and Evolution in R language. Bioinformatics (Oxford, England). 2004;20:289–90.View ArticleGoogle Scholar
- Schliep KP. phangorn: phylogenetic analysis in R. Bioinformatics (Oxford, England). 2011;27:592–3.View ArticleGoogle Scholar
- Cardoso P, Rigal F, Borges PAV, et al. A new frontier in biodiversity inventory: a proposal for estimators of phylogenetic and functional diversity. Methods Ecol Evol. 2014;5:452–61.View ArticleGoogle Scholar