Observational study on variability between biobanks in the estimation of DNA concentration

Background There is little confidence in the consistency of estimation of DNA concentrations when samples move between laboratories. Evidence on this consistency is largely anecdotal. Therefore there is a need first to measure this consistency among different laboratories and then identify and implement remedies. A pilot experiment to test logistics and provide initial data on consistency was therefore conceived. Methods DNA aliquots at nominal concentrations between 10 and 300 ng/μl were dispensed into the wells of 96-well plates by one participant - the coordinating centre. Participants estimated the concentration in each well and returned estimates to the coordinating centre. Results Considerable overall variability was observed among estimates. There were statistically significant differences between participants' measurements and between fluorescence emission and absorption spectroscopy. Conclusion Anecdotal evidence of variability in DNA concentration estimation has been substantiated. Reduction in variability between participants will require the identification of major sources of variation, specification of effective remedies and their implementation.


Introduction
Few genotyping labs will receive DNA at a stated concentration and not estimate the concentration again. This occurs because there is little confidence in the consistency of estimations between labs. The receiving lab may then require more DNA for concentration estimation than is needed for the assay itself. The process is likely to be repeated each time the same sample is assayed. When many thousands of samples are to be genotyped, they must be aggregated from multiple biobanks and months are spent standardising concentration and quality.
In the report on the Wellcome Trust Case Control Consortium [1] attention was given to DNA sample quality as a cause of data loss. The report's Table Four shows that one in 21 samples genotyped (809 out of 17,000 DNA samples) were excluded from analysis owing to problems of DNA quality or sample labelling (i.e. data quality). The single most substantial cause of exclusion appeared to be the failure of a DNA sample to attain a single nucleotide polymorphism call rate of >97%. This failure may arise from impurities in the sample, its lack of homogeneity or from inconsistency and or inaccuracy in DNA concentration estimation. Thus, even in well-curated series, time, effort and money may be wasted and an essentially nonrenewable resource is depleted.
Medical genomics research requires increasing attention to consistent high quality production and management of both the samples and associated data that are to be the subject of experimental analysis. This attention is necessary because of the need to share resources. Resource sharing involves the aggregation of samples and data from multiple biobanks and, along with improvements in phenotyping [2] is widely recognised as essential for the next generation of genetic epidemiology investigations [3].
Public Population Project in Genomics (P3G) [4] is an organisation of researchers dedicated to fostering collaboration and, thus, resource-sharing, in the field of population genomics. Sharing resources includes aggregation of samples from numerous sources. P3G reasoned that a good starting point to address concerns about sample quality in general [5] was through a focus on DNA. The first issue here is to provide data to support anecdotal evidence that DNA concentration estimations by different laboratories and biobanks are inconsistent. P3G approved [6] a study proposal on this issue from the UK DNA Banking Network [7]. The study proposal planned an observational study undertaken by biobanks that are members of P3G or members of the Biobanking and BioMolecular Resources Infrastructure Preparatory Phase (BBMRI) [8]the pan-European biobanking [9].
The aim of the pilot study described here is to test the logistics for a larger scale study and to provide some initial data on consistency between biobanks. This study seeks to discover to what extent different laboratories and biobanks obtain different estimates of the concentration of the same DNA solution. No constraint is placed on the technology or instrument used. The results demonstrate substantial variability between participants, instruments and technologies.
An observational study among forensic laboratories of DNA concentration estimation methods and results has been described [10]. Its aims were broader than the study described here, examining the effects of DNA concentration on downstream processes and DNA stability. As far as DNA concentration estimation is concerned, the authors focussed solely on whether a method was quantitative.

DNA preparation and aliquotting
A DNA solution in TE (10 mM Tris, 1 mM EDTA pH 7.5, Invitrogen #T11493) was prepared from three human cell lines at a nominal concentration of 400 ng/μl by the European Collection of Cell Cultures (Salisbury, UK), consistent with appropriate ethical use. The solution was stored (4°C) at the coordinating centre (CIGMR, Manchester). Agarose gel electrophoresis followed by ethidium bromide staining did not detect degradation. The solution was mixed thoroughly in its tube using a Labinco L46 Vortexer for 2 mins at speed setting 10. Volumes were removed manually from the tube into 50 ml tubes (Greiner #227261), diluted with TE to give nominal concentrations of 10, 40, 20, 50, 100, 150, 75, 300 ng/μl (nominal).
For each of the DNA dilutions, either volumes of 20 μl or 40 μl were dispensed into four columns of a 96-well polypropylene plates (ABgene # AB-1058) using a Tecan Free-dom 200 liquid handler (Tecan, Switzerland). The plates were shipped on dry ice to each participant (identified here by a number). They were asked to measure DNA concentration in all wells; to use their standard operating procedures for DNA concentration estimation; to return data within 28 days.

Statistical methods
Statistical analysis of the data was carried out using the SAS JMP package [11]. A mixed effect model was fitted to the data, specifying the method of taking measurements and the nominal concentrations as well as their interaction as discrete fixed effects, while the laboratory as random and nested within the methods effect. The properties and the usefulness of mixed effect models have been comprehensively reviewed [12].

Results and discussion
It was considered advantageous that this study should include academic as well as commercial participants since harmonisation is necessary if genotyping bottlenecks are to be minimised. Recruitment of participants posed little problem. This was probably due to endorsement of the study by P3G and de-identification of participants.
DNA was extracted, diluted and despatched as described. Polypropylene plates containing DNA solutions at nominal concentrations known only to the coordinating centre were despatched to 15 participants. Plates were not despatched to two potential participants because of administrative difficulties with carriers. Improved communication among participants should eliminate this difficulty.
Participants were asked to use their standard operating procedures to estimate DNA concentration and to return data within 28 days. This was achieved in many cases. Improved communications within participants' labs should serve to expedite data return. Data were analysed to identify the impact of using different methods in different laboratories for estimating DNA concentrations. The variance of the difference between the measured concentrations and the nominal concentration was seen to increase with the nominal concentration for all methods and for all laboratories. However, the variance of the ratio was homogeneous. The results of the study were therefore summarised with respect to R. When R = 1, the measurements confirm precisely the nominal concentrations.
There is no expectation that R shall be unity since the measurement of the original DNA stock concentration is not absolute.
Data were analysed for all concentration measurements. A mixed effect model was fitted to the data, specifying the method of taking measurements and the nominal concentrations as well as their interaction as discrete fixed effects, while the laboratory were taken as random and nested within the methods effect. Considering the data as a whole, and based on 1118 observations (compared with 1280 expected observations), the data indicate from the summary of fit, from the analysis of variance and from analysis of lack of fit ( Table 1) that there is considerable overall variability. This makes it difficult to draw other conclusions firmly.
The data were analysed to examine the effects of participant, technology and instrument. Figures 1, 2  Participant the variability in R by its mean and its standard deviation for each technology by participants (Figure 1), by instrument (Figure 2), by nominal concentration (Figure 3) and by participant and nominal concentration (Figure 4). There was strong evidence that there is a statistically significant difference between technologies: i.e. fluorescence emission produces results showing less variability than produced by absorption spectroscopy. Note that this does not necessarily mean that the former technology is intrinsically less variable than the latter and that therefore the latter technology should be abandoned. It means only that the deployment of the latter technology can generate greater variability than the former. Detailed methods analysis will establish whether it is more practical or efficient or quicker to reduce variability associated with one or the other technology.
There was also strong evidence that there is a statistically significant difference between participants (Figures 1 and  4). However, as all except one participant have used a single method, this difference may be due to the instrument or technology, rather than the participant. Participants 1, 2, 3 and 5 seem to have obtained results with lower variability than the others, although it should be borne in mind that the overall variability subverts this conclusion. Differences between participants may have causes that overlap with the causes of variability from an individual participant. For example, if different operators perform differently within a lab, those differences may be the same for operators in separate labs. Participants that strictly adhere to quality standards such as ISO9001-2000 should show less within-lab variability. However, even if two participants implement strictly a standard such as ISO9001- 2000, there may still be substantial between-lab variability owing to lack of identity or lack of precision within one or both of their standard operating procedures or owing to environmental variability.

Variability charts by technology and instrument
For each of the nominal concentrations of DNA, variability remained high regardless of technology and participant (Figures 3 and 4). This runs counter to the conventional wisdom that very low or very high DNA concentrations are more difficult to measure accurately. It does not address the question of whether there are the same or different major sources of variability as DNA concentration changes. In this pilot study, no attempt was made to assess variability associated with different DNA molecular weights.

Conclusion
This is the first reported observational study on DNA concentration estimation among both academic and commercial participants. We have demonstrated the feasibility of an international DNA concentration estimation harmonisation project involving both academic and commercial participants. We provide evidence for significant variation in DNA concentration estimation within and between laboratories. This therefore has confirmed anecdotal evidence for such variation.
This evidence justifies undertaking systematic investigations into the sources of error and the identification, testing, verification and implementation of remedial action that will reduce DNA concentration estimation variability. Such investigations will provide the evidence base for  protocol modification. Improvements in the consistency of measurement of DNA are essential for efficient genotyping; for implementing ambitious experimental designs in genetic epidemiology; and for compliance with quality assurance recommendations (e.g. from the Organisation for Economic Cooperation and Development [13]) and requirements (e.g. for continued ISO9001-2000 accreditation).