Design and implementation of a custom next generation sequencing panel for selected vitamin D associated genes

Background Biologically active vitamin D has an important regulatory role within the genome. It binds the vitamin D receptor (VDR) in order to control the expression of a wide range of genes as well as interacting with the epigenome to modify chromatin and methylation status. Vitamin D deficiency is associated with several human diseases including end-stage renal disease. Methods This article describes the design and testing of a custom, targeted next generation sequencing (NGS) panel for selected vitamin D associated genes. Sequencing runs were used to determine the effectiveness of the panel for variant calling, to compare efficiency and data across different sequencers, and to perform representative, proof of principle association analyses. These analyses were underpowered for significance testing. Amplicons were designed in two pools (163 and 166 fragments respectively) and used to sequence two cohorts of renal transplant recipients on the Ion Personal Genome Machine (PGM)™ and Ion S5™ XL desktop sequencers. Results Coverage was provided for 43.8 kilobases across seven vitamin D associated genes (CYP24A1, CUBN, VDR, GC, NADSYN1, CYP27B1, CYP2R1) as well as 38 prioritised SNPs. Sequencing runs provided sufficient sequencing quality, data output and validated the effective library preparation and panel design. Conclusions This novel, custom-designed, validated panel provides a fast, cost effective, and specific approach for the analysis of vitamin D associated genes in a wide range of patient cohorts. This article does not report results from a controlled health-care intervention. Electronic supplementary material The online version of this article (doi:10.1186/s13104-017-2664-z) contains supplementary material, which is available to authorized users.


Background
Biologically active vitamin D (1,25-dihydroxyvitamin D) is involved in the regulation of gastrointestinal calcium absorption and bone homeostasis [1]. A precursor (7-dehydrocholestrol) to active vitamin D is produced in the skin on exposure to ultraviolet B (UVB) radiation and is also found in certain foods such as oily fish and cheese [2]. Recent NICE guidelines recommend that pregnant and breastfeeding women, children ages 6 months to 5 years, adults over 65 years or anyone who is not regularly exposed to the sun should take a daily vitamin D3 (cholecalciferol) supplement up to 10,000 IU [3,4]. A wide range of enzymes are involved in the metabolism and activity of vitamin D. Cytochrome P450 enzymes including CYP27B1, CYP2R1 and CYP24A1 contribute to the hydroxylation steps.
In the kidney, 1-α-hydroxylase, encoded by CYP27B1 converts 25-hydroxyvitamin D3 to 1,25-dihydroxyvitamin D, directly affecting the circulating levels of active vitamin D. Polymorphisms within CYP27B1 have previously been associated with the development of type 1 diabetes [5]. The GC gene encodes the vitamin D binding protein which has affinity for all vitamin D metabolites Open Access BMC Research Notes *Correspondence: kbenson04@qub.ac.uk 1 Nephrology Research Group, Centre for Public Health, Queen's University Belfast, Belfast BT9 7AB, UK Full list of author information is available at the end of the article [6]. Polymorphisms within GC have been associated with reduced 25-hydroxyvitamin D3 levels [7].
The VDR is a steroid receptor expressed in more than 30 cell types [8]. VDR functions as a heterodimer with the retinoid X receptor (RXR) in the presence of vitamin D to activate transcription of vitamin D controlled genes [9,10]. This is one of the mechanisms by which 1,25-dihydroxyvitamin D regulates the genome, but it has also been shown to affect the epigenome through chromatin modifiers and methylation changes [11]. It has been suggested that as much as 3% of the human genome is under some form of regulatory control by vitamin D [8,12]. Two lymphoblastoid cell lines stimulated with 1,25-dihydroxyvitamin D were used to generate a ChIP-Seq genome-wide map, which identified 229 genomic regions differentially bound with VDR before and after treatment with active vitamin D [13]. Significant changes in gene expression after stimulation with 1,25-dihydroxyvitamin D were observed for gene loci such as IRF8 and PTPN2 which were not previously associated with vitamin D regulation [13]. Selected vitamin D related SNPs, within genes associated with vitamin D metabolism, are outlined in Table 1 [14][15][16][17][18][19][20][21][22].
Although the cost of both whole genome and exome sequencing is falling, these technologies are not always cost effective for routine use in laboratories or large-scale population studies. The bioinformatic analysis and raw data storage space required for large scale sequencing projects is often prohibitive. As an alternative, researchers may opt to sequence targeted genetic regions using sequencing panels [23]. There are two central approaches to enrich for targeted DNA regions; solution hybridisation using oligo-nucleotide probes such as the Agilent SureSelect ™ custom panels, or PCR enrichment such as Ion Torrent AmpliSeq ™ panels. During this study an Ion AmpliSeq ™ (http://www.ampliseq.com) customised panel was designed using the Ion AmpliSeq ™ Designer online tool and primer pairs to enrich genomic regions known to be associated with vitamin D [14,15]. This PCR enrichment method was used as it provides deep and even coverage over the small genomic regions of interest which were targeted in our study and delivers higher rates of on-target sequencing [24,25]. In contrast, where larger regions such as the whole exome are being targeted, hybridisation methods are recognised to be more suitable due to a higher rate of uniformity [25]. A comprehensive list of all regions targeted by the panel is included in Table 1.
The Ion PGM ™ desktop sequencer, released in 2010, is ideally suited to sequencing selected batches of amplicons such as those used in AmpliSeq ™ sequencing panels [26]. Increases in sequencing throughput between the Ion 314 ™ , Ion 316 ™ or Ion 318 ™ silicon chips used by the Ion PGM ™ are due to differences in chip size and closer packing of wells. The Ion S5 ™ and Ion S5 ™ XL sequencers were released in 2015, offering increased speed of data analysis, new scalable sequencing chips and considerably reduced hands-on-time. The Ion S5 ™ XL uses Ion 520 ™ , Ion 530 ™ and Ion 540 ™ chips. It has a larger processor and is therefore capable of producing results faster and with a higher throughput than the Ion S5 ™ . Sequencing output using Ion Torrent sequencers will vary depending on the input material, read lengths, chemistry employed, and the type of chip used.
Vitamin D deficiency has been suggested to be a risk factor for chronic kidney disease (CKD) [27][28][29]. A metaanalysis conducted in 2011 showed that higher vitamin D levels are associated with increased survival in CKD patients [30]. Increased activity of the renin-angiotensin system is strongly associated with CKD and diabetes [31]; vitamin D is a negative regulator of the renin-angiotensin system [12].
Many kidney transplant centres in the United Kingdom (UK) routinely prescribe vitamin D supplements for kidney transplant recipients [32]. The kidney transplant population is prone to vitamin D deficiency due to several factors including avoidance of direct sunlight, dietary deficiency, renal impairment (limiting the activation of vitamin D because of reduced 1-alpha hydroxylation) and anti-rejection medication such as corticosteroids which are known to increase metabolism of vitamin D. Kidney transplant patients have an increased risk of skin cancers and so are encouraged to avoid direct sunlight, thereby reducing the vitamin D which these patients produce in response to UVB exposure [33]. A 2007 UK study which tracked vitamin D levels in 104 newly transplanted and 140 long-term renal transplant recipients found that vitamin D deficiency was present in the majority of investigated patients [34]. Existing literature shows that vitamin D deficiency also increases risk of the development of the common renal transplant complication, new onset diabetes after transplantation (NODAT) [35,36]. Vitamin D is known to increase insulin sensitivity and deficiency of vitamin D may also aggravate transplant-related insulin resistance [37]. A recent study by Keyzer and colleagues which included 435 stable renal transplant recipients demonstrated that low vitamin D levels were independently associated with all-cause mortality and a more rapid decline in eGFR over time [38].
The aim of our study was to design, validate and apply a custom Ion AmpliSeq ™ NGS panel targeting selected vitamin D associated genetic regions. As described, vitamin D deficiency is strongly associated with chronic kidney disease and NODAT. Therefore, the panel was tested in two renal transplant populations to demonstrate its efficacy and results compared between two sequencers.

Methods
Blood derived DNA was used for this study which was stored at −20 °C. DNA was extracted using the salting out method. The cohorts used for testing of the panel were: (1) a cohort of kidney transplant recipients (n = 77) from Belfast, Northern Ireland; (2) kidney transplant patients (n = 93) in a Birmingham-based cohort who had vitamin D levels and oral glucose tolerance test (OGTT) results measured prospectively at three separate time points (immediately before transplant, 3 months post-transplant and 12 months post-transplant) [39]. The patient characteristics for each cohort are summarised in Table 2. Vitamin D levels were measured in batches at an accredited NHS hospital lab from samples frozen at −40 °C, using mass spectrometry for serum total 25-hydroxyvitamin D.
Possible confounders for vitamin D levels such as the season on blood sampling, skin colour, extensive use of high protection sunscreens or covering with clothing were not accounted for in this study. The Belfast cohort was sequenced using both the Ion PGM ™ and the Ion S5 ™ XL sequencers in order to compare variant calling efficiency and technical sample preparation. Variants called differently between sequencers were further investigated using Sanger sequencing to determine which next generation sequencer had identified the variant correctly. In addition, an applied association analysis was undertaken to determine if the targeted variants were associated with vitamin D levels or NODAT in each of these cohorts.
At the time of listing for transplantation, all potential kidney transplant recipients were asked for written informed consent (or consent from a parent or guardian in the case of children) for their data to be stored within the Kidney Transplant Database and used in projects in an anonymised form. Approval for use of this database was granted by Office for Research Ethics Committees Northern Ireland (ORECNI)-reference number ORECNI 12/NI/0178. A total of 77 transplant recipients were included in this study from the Belfast renal transplant cohort. NODAT was defined in this cohort as the new requirement for oral hypoglycaemic agents or insulin as a result of post-transplant hyperglycaemia. The average age of the included patients was 45 with an age range of 12-71 years. There were more male renal transplant recipients (n = 50) than female recipients (n = 27) which is consistent with the established statistics that men are more likely to develop end stage renal disease (ESRD) than women [40]. The Belfast cohort patients included in this study were of white ethnicity. The Birmingham renal transplant cohort is more ethnically diverse and the patients included in this study reflect this. The average age of patients in this group was 44 years with an age range of 17-71 years. NODAT was defined in the Birmingham cohort if (a) fasting glucose ≥7 mmol/L or 2 h OGTT was ≥11.1 mmol/L from day 7 onwards and persisted at the 3 month timepoint, (b) HbA1c ≥6.5% (48 mmol/mol) from 3 months onwards, or (c) requirement for institution of therapy for NODAT in which case OGTT was not undertaken (fasting clinic glucose was ≥7 mmol/L in all such patients). Seventy Caucasian patients with recorded data on NODAT status were included as a replication cohort for the Belfast group NODAT association analysis.
The custom vitamin D panel targeting genes (Table 1) and SNPs associated with vitamin D identified during previous research with collaborators in Birmingham was designed using the Ion AmpliSeq ™ Designer ™ online tool [14,15]. Four of these targeted genes were chosen after their reported association with vitamin D insufficiency in a genome-wide association study published in the Lancet in 2010 (GC, NADSYN1, CYP2R1, and CYP24A1) [7]. In addition, a list of six genes which code for proteins which are established components of vitamin D metabolism (CYP27A1, GC, CYP2R1, VDR, CYP27B1 and CYP24A1) were chosen in light of a publication by Cooper and colleagues in 2011 [41]. This paper discussed inherited variation in vitamin D genes and their association with predisposition to type 1 diabetes [41]. These prior publications informed the list of seven genes included in the panel. A previous study by Nejentsev and colleagues identified VDR polymorphisms which are necessary to study common variation in populations from the British Isles; these important SNPs were also included in the panel [42]. The four classically genotyped SNPs (BsmI, FokI, TaqI, ApaI) were also contained within the panel for completeness. Further SNPs were selected from previous experimental work from our group completed in collaboration with colleagues from Birmingham [14].
As part of this research, in which the linkage disequilibrium structure for vitamin D in a UK population and additional ancestries was elucidated, robust sequencing methods including Sequenom and Sanger sequencing were employed [14]. These regions include large and small genes and are representative of other regions of the genome which groups which may use the Ion AmpliSeq ™ method to sequence. These chosen targets (including coding and untranslated regions and 50 bp exon region flanking for genes) were entered into the online tool to generate BED files. The resulting amplicons were divided by the online designer into two primer pools to maximise target specificity. These experiments employed early access reagents for the Ion S5 ™ XL System, and Ion AmpliSeq ™ technology, from Thermo Fisher Scientific. Libraries were prepared using the Ion AmpliSeq ™ library kit 2.0 according to the manufacturer's instructions. Genomic DNA (20 ng/ sample) was used for the initial PCR reactions. Samples were diluted no more than 4 h prior to sequencing to prevent DNA degradation. The Ion Chef ™ or Ion One Touch ™ and ES ™ were used for emulsion PCR and target enrichment. The Ion PGM ™ was used with Ion 318 V2 ™ or Ion 316 V2 ™ chips to sequence the renal transplant cohorts. The Belfast kidney transplant recipients were re-sequenced using the Ion S5 ™ XL sequencer on Ion 530 ™ chips. Reported SNPs from the Ion Torrent sequencing runs on the Belfast kidney transplant cohort were compared between the two next generation sequencers. Variants found on one sequencer, but not the other, were then re-sequenced using Sanger sequencing. If possible, a sample which was reported with the variant on both sequencers, a sample which was reported as not having the variant on both sequencers and the sample in which the discrepancy was identified were also Sanger sequenced.
An overview of the analysis workflow is illustrated in Fig. 1. Preliminary analysis was conducted automatically on the Ion Torrent Suite ™ Version 4 where data was aligned to hg19.p5. Variant caller files generated from this analysis were used to compare the results between the Ion PGM ™ and Ion S5 ™ XL sequencers. Sequencing data in the form of.bam files were further analysed using Partek Genomics Suite ™ . SNPs were identified using dbSNP Version 138 and annotated using RefSeq Version 2015-08-04. The resulting variant files from Partek Genomics Suite ™ were used for association analysis.
To complete the association analysis,.ped files were generated for use with PLINK software Version 1.07. An unadjusted genotype association test for trend was used to find associations between NODAT status and the targeted variants in the Belfast renal transplant cohort using a P value threshold of 0.05. This P value is reported in this manuscript as P trend Variants with Hardy-Weinberg equilibrium (HWE) <1 × 10 −5 were removed from the analysis. In previous studies, transplant recipient age and body mass index (BMI) were shown to be important risk factors for the development of NODAT [43][44][45][46]; these variables were included in the regression model (P LR ). Changes in vitamin D levels at 3 and 12 months after transplant were evaluated for association with gene variants in the Birmingham kidney transplant cohort regarding vitamin D levels post-transplant as a quantitative trait (P trend ). As with the association analyses in the Belfast cohort, variants with a minor allele frequency (MAF) of <0.05 or a Hardy-Weinberg equilibrium (HWE) <1 × 10 −5 were removed from the analysis. Recipient age and weight at transplant was included in this regression model (P LR ). In addition, the association between NODAT and the identified gene variants was investigated in a subset of the Birmingham cohort (n = 70). All modelling assumed an additive inheritance pattern.
Sanger sequencing data was analysed using Vector NTI Advance ™ Version 11.5.1. The resulting data files were aligned using Contig Express ™ against the GRCh38 reference genome obtained from the Ensembl online resource (Version 8.4) [47]. The resulting chromatograms were visually inspected to determine whether the variant was identified correctly using NGS.

Results
Targeted SNPs, along with translated and untranslated regions for seven genes associated with vitamin D regulation were successfully included in the custom panel (90.58% total coverage; 43.76 kb; Table 1). The targeted SNPs were shown in previous publications to be important in Caucasian populations for vitamin D metabolism and regulation and were located within the seven chosen genes [42]. Amplicons were automatically split between two pools containing 163 and 166 amplicons. These individual amplicons were 124-274 bp in length with an average size of 207 bp. PCR primers used in Ion AmpliSeq ™ kits provided a high level of specificity and simplicity.
Analysis on the Ion Torrent Suite ™ included determination of the Ion Sphere Particle (ISP) loading, the number of usable reads and the length of sequencing reads for each run. Figures 2 and 3 show representative sequencing statistics for sequencing runs performed on the Ion PGM ™ and Ion S5 ™ XL sequencers respectively.
Additional analysis was performed using plug-ins including variant caller, coverage analysis, and file exporter. Coverage was calculated using the coverage analysis plug-in, and representative plots are shown in Fig. 4. The hands on time required for the Ion PGM ™ (approximately 1-1.5 h) was considerably higher than for the Ion S5 ™ XL (approximately 15 min) (Fig. 1).
A summary of the maximum output statistics for the sequencing chips from the NGS runs completed on the Ion PGM ™ and Ion S5 ™ XL are shown in Table 3. Nine of the 11 sequencing runs were completed on the Ion PGM ™ and two were completed on the Ion S5 ™ XL. Similar percentages of usable reads were obtained from all comparable runs. Polyclonal levels varied depending on the dilution factor used following library preparation and had a marked impact on the percentage of usable reads. These polyclonal levels increased with respect to the level of DNA input. The sequencing output varied from 429 Mb to 3.22 Gb and the total number of reads was relatively consistent ranging from 2,755,634 to 80,308,654.

Fig. 1 Overview of data analysis workflow. This image provides an overview of the described laboratory and analyses processes
These data obtained for the two different cohorts were comparable although those obtained using the Ion PGM ™ were significantly lower than those obtained using the Ion S5 ™ XL.
Based on the maximum output achieved by each of the different chip types used in this study, we observed a 1.6 fold increase between data output from the Ion 316 ™ and Ion 318 ™ sequencing chips and a 4.8 fold increase between the Ion 318 ™ and Ion 530 ™ sequencing chips (Table 3).
Libraries for the Belfast renal transplant cohort were sequenced on both the Ion PGM ™ and Ion S5 ™ XL. Due to the higher capacity of the Ion 530 ™ chip both amplicon pools were combined on the same sequencing chips for the Ion S5 ™ XL sequencing runs. Twenty-three SNPs were identified from the variant caller plug-in on the Ion Torrent Suite ™ on one sequencer but not the other. Sanger capillary sequencing was used to determine which SNPs were genuine using 14 sets of PCR primers. Sanger sequencing is regarded as the gold standard for variant identification [48]. The SNP locations in samples with and without the proposed variants were visually inspected using Contig Express ™ . The Ion S5 ™ XL correctly identified nucleotide bases more often than the Ion PGM ™ at these discordant SNP calls locations. Representative examples of the visual inspection of these  The data obtained from the sequencing runs outlined in Table 3 was analysed on Partek Genomics Suite ™ and used to perform association analysis to demonstrate the efficacy of the custom panel. Following analysis on Partek Genomics Suite ™ , off target and low coverage (less than 30 fold) SNPs were removed along with SNPs identified as incorrect following Sanger sequencing.
Initially, the association between development of NODAT post-renal transplant and the vitamin D variants in the panel was investigated in the Belfast kidney transplant population. In total, 457 SNPs were included for analysis from sequencing data obtained using the Ion S5 ™ XL. The top hits from this genotype association analysis using a nominal significance threshold of P trend < 0.05 along with the corresponding results of the regression analyses are shown in Additional file 1: Table S1.
The association between NODAT development and the target variants was also investigated in a subset of the Birmingham kidney transplant cohort (n = 70) which included 16 NODAT cases and 54 renal transplant controls. The top hits are shown in Additional file 1: Table  S2 along with the results of the regression analysis. In total, nine variants were nominally associated with NODAT development in the Birmingham cohort at the P trend < 0.05 significance threshold in the genotypic test for trend. One variant was nominally associated with NODAT status in both the Belfast and Birmingham cohorts (rs1801239).
The association between the changes in vitamin D levels 3 or 12 months following renal transplant and the variants targeted on the custom panel were investigated in the Birmingham kidney transplant cohort (n = 93). The results of these analyses are shown in Additional file 1: Table S3 using a genotype test for trend with a threshold of P trend < 0.05. Variants with a MAF of less than 0.05 or a HWE P value of less than 1 × 10 −5 were removed from this analysis. No variants retained significance after adjustment for multiple comparisons.

Discussion
This study demonstrated the successful application of a customised vitamin D targeted NGS panel in two renal transplant cohorts. The library preparation procedure successfully enriched for the targeted genomic regions designed using the Ion AmpliSeq ™ designer tool. Sequencing runs provided sufficient sequencing quality, data output and validated the effective library preparation and panel design. The panel achieved sufficient (>30 fold) coverage to reliably identify variants associated with vitamin D metabolism and regulation. The representative coverage maps shown in Fig. 4 demonstrated that similar coverage patterns were obtained using both sequencers which confirms the consistency of the targeted panel.
The Ion Torrent Suite ™ indicated that the overall quality of the data obtained from sequencing runs on the Ion S5 ™ XL was better than the data quality from sequencing performed on the Ion PGM ™ . For example, the bead loading onto the silicon chips was higher in runs performed on the Ion S5 ™ XL as shown in Figs. 5, 6 and 7. Higher levels of data output were achieved when the panel was used on the Ion S5 ™ XL rather than the Ion PGM ™ as shown by the sequencing statistics in Table 2. These differences are almost certainly due to the chip type employed during the sequencing runs. The Ion 530 ™ chip used in this study has an inherently higher capacity (~38 million wells per chip) than the Ion 316 ™ (~6 million wells per chip) or Ion 318 ™ chips (~11 million wells per chip). Higher chip capacity allows more samples to be included on the same chip with higher coverage for each sample. The number of samples and coverage level can be balanced when planning a sequencing run to achieve optimum results.
In addition to larger sample capacity and higher coverage potential on the Ion S5 ™ XL, this sequencer was demonstrated to identify variants correctly more often than the Ion PGM ™ . When variants called differently on both sequencers were investigated by Sanger sequencing, it was determined that the Ion S5 ™ XL was correct in 14 instances while the Ion PGM was correct nine times. The Ion S5 ™ XL did call slightly higher numbers of correct variants than the Ion PGM ™ . Despite the differences between the sequencers, the majority of reported results were consistent between both machines which is evidence that this custom panel is effective and reliable.
Association analyses were conducted to demonstrate the efficacy and potential applications of the custom NGS panel. These analyses were conducted in two kidney transplant cohorts to investigate the association between the targeted variants and either changes in vitamin D levels post-transplant or the development of NODAT, a serious post-transplant complication. None of the variants identified in the presented analyses reached identified in pool 2 of the Ion PGM ™ sequencing results. This was not called by the Ion S5 ™ XL. Sanger sequencing on the right confirmed that there is no variant at this locus in either sample RT106 or sample LT744. A single peak is shown for C in both of these samples (the Sanger sequencing image to the right of this image is shown in the reverse complement). These results confirm that the Ion PGM ™ called this variant incorrectly genome-wide significance thresholds due to the small number of samples included. This was expected as these analyses were designed to provide proof-of-concept and were not intended to represent a well powered genotype association study. Hence, no adjustments for multiple testing have been made in the presented results.
The Belfast renal transplant cohort was used to investigate the association of NODAT with these vitamin D related regions on the custom panel. The results presented in Additional file 1: Table S3 show the nominal association of the NODAT phenotype at the P trend < 0.05 threshold in a genotype association test for trend.
Two missense mutations (rs1801240 and rs1801239) in the CUBN gene exon 57 were nominally associated with NODAT following this analysis. These mutations are in strong linkage disequilibrium as shown by the Haploview image in Fig. 10 (D' = 1.0, 95% CI 0.89-1.0). An article by Tzur and colleagues has previously described a haplotype characterised by CUBN exon 57 and 42 SNPs including these variants [49]. Tzur associated this haplotype with an increased incidence of albuminuria [49]. Proteinuria post-transplant is a known risk factor for NODAT and if the patients were predisposed to albuminuria this may have in part contributed to the development of the NODAT transplant complication. A subset of the Birmingham renal transplant cohort was used to replicate these analyses (Additional file 1: Table S2). Notably, rs1801239 was also identified in this cohort. The variant rs1801239 had a similar direction of effect following logistic regression in the previous GWAS (OR: 3.904; P LR : 0.001; 95% CI 1.7-8.8). It is important to note that none of the highest ranking variants identified in this analysis association analysis were identified in a previous genome wide association study (GWAS) conducted in the same Northern Irish renal transplant population [46]. In addition, none of variants in the exon 57 and 42 regions of the CUBN gene covered on the GWAS panel were associated with NODAT in this previous study at the genome-wide significance threshold.
In the Birmingham renal transplant cohort, the association between change in vitamin D levels after kidney transplantation and the variants on the custom panel was investigated. The results of these analyses are shown in Additional file 1: Table S3. Three SNPs were found in common between the analyses of vitamin D change at 3 and 12 months including two variants in the promoter rank region of CYP27B1.
The seasonality of testing for vitamin D levels was not considered during this study. This is a potential limitation, but this study involved renal transplant recipients who are known to have an inherently higher risk of skin malignancy associated with immunosuppression and are therefore advised to use high factor sunscreens and to stay out of direct sunlight [50,51]. There clinical factors would be likely to reduce the impact of seasonality on the vitamin D levels in these individuals.
Most of the associated variants, using data from both sequencers and in both cohorts, were in the cubilin (CUBN) gene. This gene encodes the peripheral membrane protein cubilin which contributes to the uptake of vitamin B12 and metabolism of vitamin D [52]. A previous study demonstrated that patients with non-functioning cubilin lose significant amounts of 25-hydroxyvitamin D3 in their urine, reducing their plasma 1,25-dihydroxyvitamin D levels [22]. Cubilin is essential for the reabsorption of proteins in the proximal tubules of the kidney [53]. A study published in 2012 investigated the association between variants in the CUBN gene and graft failure in renal transplant recipients [54] following a GWAS which linked these variants to albuminuria [55,56]. The Fig. 10 Haploview ™ image of SNPs rs1801240 and rs1801239. This image shows that these two missense mutations identified by the Ion S5 ™ XL are tightly linked (strong linkage disequilibrium is indicated by the red diamond) study found that the CUBN variant rs7918972 in the organ donor genotype was associated with graft failure in the transplant recipient population. It is important to note that the CUBN gene is large (0.31 mega-bases) and so it is possible that the large number of variants identified in this gene in the presented study may be a result of the size of the gene rather than a of true association with the investigated phenotype. Larger genes will naturally harbour more variants than smaller genes included on the panel. It is important to account for this by applying P value threshold which accounts for multiple testing in further well powered studies [57].
The association analysis performed on these cohorts using data from two different sequencers demonstrates the capacity of the custom panel to detect SNP associations. To gain more statistically significant and meaningful associations the panel should be applied to considerably larger cohorts. This NGS panel will allow collaborating laboratories to perform identical targeted sequencing for vitamin D associated genomic regions. Studies that demonstrate significant associations between the phenotype of interest because of investigations using this targeted panel, will require further analysis using functional studies to explore the mechanisms underpinning the associations.
Sanger sequencing is costly when focusing on many variants or genes such as those targeted in this panel. Sanger sequencing is also more time consuming than NGS techniques which offer a higher throughput approach to variant detection. Targeted NGS panels such as this custom AmpliSeq ™ panel offer an attractive alternative to Sanger sequencing for projects encompassing larger numbers of target variants. Whole exome sequencing is a popular alternative to custom NGS panels but this approach is more expensive than targeted panels. The larger volume of sequencing data produced by exome sequencing requires more storage space and bioinformatic analysis which can be associated with significant costs. Incidental findings are also a more significant issue when using whole exome sequencing.
This NGS panel represents a cost effective, fast molecular test with minimal hands on time. This high throughput panel can be used to sequence one pool of 96 amplicons on a single Ion 318 ™ chip. The Belfast renal transplant cohort in contrast has more limited genetic heterogeneity with a very tightly defined NODAT phenotype. It is vital to note that the association results presented in this study are primarily demonstrating proof-of-principle that this NGS panel can be effectively applied to a range of varying populations rather than trying to establish definitive evidence of associations. The patient numbers included in these cohorts do not provide enough power to reliably identify associations with small to moderate effect sizes, but do confirm the utility of the panel. Despite these limitations, this reliable and cost effective vitamin D panel was successfully used to sequence the targeted vitamin D regions in two renal transplant populations with excellent coverage and successful variant calling.
Homo-polymer regions are one of the limiting factors in any project relying on Ion Torrent ™ sequencers which tend to underestimate or overestimate the length of repeated, identical base calls. A previous study reported that 94-97% of total error on the Ion PGM ™ was due to homo-polymer regions [58]. During this study, these regions were indeed shown to have a high error rate. Most of the discrepancy variants investigated by Sanger sequencing were close to these repetitive regions, such as the DNA fragment shown in Fig. 6. The design of the panel itself was also shaped by these homo-polymer regions. Table 1 shows that some regions could not be covered by the panel due to repetitive regions in the target sequences.

Conclusions
During the course of this study, a custom NGS panel was designed, implemented and validated. Two separate kidney transplant cohorts were used to demonstrate the potential of the panel and to verify the effectiveness of this targeted NGS approach. The custom panel successfully targeted genomic regions putatively involved in vitamin D regulation.
A comparison of the targeted panel using the Ion PGM ™ and Ion S5 ™ XL sequencers from the Ion Torrent range was performed during this study. This was achieved by comparing the variants identified by the Ion Torrent ™ variant caller plug-in following runs of the same panel on the two sequencers. The results of this analysis showed that the Ion S5 ™ XL identified more variants, with higher accuracy than the Ion PGM ™ in comparable runs. Therefore, although the panel was effective on both sequencers, the inherent advantages of the Ion S5 ™ XL provided clear improvements in performance over the Ion PGM ™ .
Proof-of-concept association analyses were performed using the sequencing data obtained using the custom NGS panel. It is possible that the mutations in vitamin D linked to change in vitamin D levels after kidney transplantation or to NODAT in these analyses are valuable. However, the aim of the analysis was to demonstrate the use of the custom NGS panel rather than undertaking comprehensive associations with gene variants in these cohorts. In order to further elucidate the link between these variants and the phenotypes investigated here, larger scale association studies are necessary alongside functional studies, particularly as change in BMI was included in the covariate model. This work could lead to further understanding of the pathways involved in the development of NODAT for example, leading to treatments and perhaps diet and lifestyle recommendations which are better tailored to individuals with certain genetic profiles.
Overall, this study has demonstrated the effectiveness of this novel, targeted NGS panel for genomic regions putatively involved in vitamin D regulation and metabolism. This panel has a wide range of potential applications and will be valuable to an assortment of future projects.