Methods
Subjects
Biospecimens were collected from six research subjects. Written informed consent was obtained, and the study was conducted with approval from the Coriell Institute for Medical Research Institutional Review Board, as previously described in Shirley et al. [9]. Six tubes of whole blood were collected from each individual, and peripheral blood mononuclear cells (PBMCs) were isolated from these samples.
Cell culture
For each research participant, whole blood from one tube was used for DNA extraction, and the remaining five tubes of blood were used to establish five separate LCLs. As outlined in Fig. 1, PBMCs were independently isolated from each tube of whole blood (Fig. 1, Steps 1–2) followed by transformation with Epstein Barr Virus (Fig. 1, Step 3; as previously described in Shirley et al. [9]). Once these initial LCLs were established, foundation stocks were cryopreserved for DNA extraction as well as for future passaging and expansion in order to generate large distribution banks of cells. The expansion process began with growing the LCLs until the culture density reached near saturation (Fig. 1, Step 4). To enable further growth, each LCL culture was split into four diluted flasks, and this step is considered the first passage (Fig. 1, Step 5). When the growth in these flasks plateaued again (Fig. 1, Step 6), the cultures were combined and diluted in a larger roller bottle to enable further growth, creating the second passage (Fig. 1, Step 7). Once each culture reached harvest density, it was considered the first expansion (Fig. 1, Step 8). Cryopreserved LCLs (Fig. 1, Step 9) were then put through the process again beginning with growing expanded LCLs in media (Fig. 1, Step 4) for additional expansion generations. Between the second and third expansion, additional passaging took place.
In total, 29 LCLs were successfully established (Additional file 2: Table S1; [9]) and sequentially expanded three times. Creation of the additional expansions followed the culturing procedure outlined above starting with LCLs from the first expansion to create expansion 2, and LCLs from expansion 2 to create expansion 3. Between the first and second expansion each LCL was passaged 2 times and between the second and third expansion each LCL was passaged 10 times. In addition, technical replicates of the original freeze from two LCLs were processed, and technical replicates of the first expansion from four LCLs were constructed (Additional file 1: Figure S1, Additional file 2: Table S1).
LCLs used in the current study are listed individually in Additional file 2: Table S1, and these samples are available as part of the NIGMS Human Genetic Cell Repository at the Coriell Institute for Medical Research (https://catalog.coriell.org/1/NIGMS).
Genomic data collection and analysis
One hundred and twenty-nine DNA samples, which included 7 technical replicates were processed for analysis on the Affymetrix Human SNP Array 6.0 (Affymetrix 6.0; Affymetrix, Inc. Santa Clara, CA). CEL files were input into Affy Power Tools (APT 1.16; https://www.thermofisher.com/us/en/home/life-science/microarray-analysis/microarray-analysis-partners-programs/affymetrix-developers-network/affymetrix-power-tools.html) to generate CHP files for each sample, and B allele frequencies and log R ratios were calculated for each sample and for each SNP. PennCNV was used to call CNVs from the B allele frequencies, log R ratios, and signal files for each sample set of autosomal chromosomes using default settings [11]. We retained CNV calls containing at least 25 SNPs for further analysis. In cases where individual PennCNV calls shared overlapping regions, we consolidated them into a single CNV region. We additionally incorporated the four immunoglobulin regions as defined in the PennCNV documentation as an additional results filter [11]; although we used the UCSC Genome Browser Liftover tool to convert the coordinates to B37 [12]. In addition, there was one set of duplicated CNVs on chromosome 17 in the third expansion of sample GM22644 that were treated as a single region, and another set of duplicated CNVs on chromosome 11 in the original GM22679 culture and the first two expansions of this sample that were treated as a single region. Plink 1.90 (http://www.cog-genomics.org/plink/1.9/) [13] was used to calculate SNP concordance (the proportion of shared SNPs that are identical by state), and a custom R script was developed to calculate CNV concordance (the proportion of shared CNVs between samples; available upon request).
Results and discussion
SNP stability
SNP stability was generally consistent across expansions. As displayed in Fig. 2, SNP concordance ranges from 99.90 to 99.98% across expansions, and this range was well within the range of SNP concordances observed across technical replicates (0.9961–1.0000). These SNP concordances were also within the reported technical reproducibility of the Affymetrix SNP 6.0 Array (99.9%) [9] and the average concordance rate of SNP 6.0 data analyzed with the Affymetrix Birdseed algorithm (99.8%) [14].
CNV stability
In total, we identified 22 unique CNV regions across all 129 samples (Additional file 3: Table S2); however, 3 of these were immunoglobulin (IG) regions that have been previously implicated in cell culture artifacts [11]. CNV regions ranged from approximately 45 kb to 50 Mb in size (average region size = 174 kb; median region size = 199 kb). When including IG regions, CNV concordance across expansions was lower (median 0.33) than was previously reported for first passage LCLs (median 0.56 [9]). Concordance increased noticeably with the exclusion of IG regions (median 0.50). Figure 3 displays CNV region sharing across each expansion. Unlike SNP concordance, CNV concordance was almost completely outside of the range of CNV concordance across technical replicates. Given the relatively small subject sample size (n = 6), and modest number of CNV calls (n = 22), these concordance estimates cannot fully address the question of CNV stability across LCL expansion. However, these preliminary results suggest that CNV stability may decrease over expansion generations.
Conclusions
The combined evidence indicates that SNPs are generally stable after EBV transformation [9], across early (< 50) passages [10], and across three culture expansions (current study results). Some genomic regions are vulnerable to structural instability in the transformation process [9], and other genomic regions are vulnerable to loss of heterozygosity in later passages (≥ 50) [10]; our analysis of structural variation suggests a potential decrease in CNV concordance across expansions. In our experience with NIGMS repository samples that have more pronounced and clearly defined disease causing CNVs (from cytogenetics and array copy number analyses performed in the course of routine sample quality control) we note that they are generally stable across LCL expansions and fall into a different category of mutation than what has been addressed in the current study.
Taken together, these results are most applicable to genome-wide association studies (GWAS). By design, genome-wide association studies gain statistical power to detect modest disease risk effect sizes with increased sample sizes. One way in which GWAS can increase sample size is to leverage existing clinical samples and biorepositories that offer DNA extracted from LCLs. Our results support this strategy and suggest that SNPs can be reliably measured from LCL DNA. GWAS typically include data from over one million directly measured SNPs, and often include SNPs that are indirectly inferred via imputation. Our documented SNP concordance ranges (99.90–99.98%) across cell line expansions are comparable to reported success rates for SNPs assayed with the Affymetrix 6.0 genome-wide SNP array (99.9%) [9], and higher than reported success rates for imputed GWAS SNPs (< 99.8% [15, 16]). Our results therefore support the use of LCL DNA as an important resource for GWAS. More caution should be taken in studies that are focused on the accurate measurement of germline structural variation in culture given the documented changes in CNV loci across LCL transformation, passaging, and expansion.