Validation of differentially methylated microRNAs identified from an epigenome-wide association study; Sanger and next generation sequencing approaches

Objectives Altered DNA methylation and microRNA profiles are associated with diabetic kidney disease. This study compared different sequencing approaches to define the genetic and epigenetic architecture of sequences surrounding microRNAs associated with diabetic kidney disease. Results We compared Sanger and next generation sequencing to validate microRNAs associated with diabetic kidney disease identified from an epigenome-wide association study (EWAS). These microRNAs demonstrated differential methylation levels in cases with diabetic kidney disease compared to controls with long duration of type 1 diabetes and no evidence of kidney disease (Padjusted < 10−5). Targeted next generation sequencing analysis of genomic DNA and matched cell-line transformed DNA samples identified four genomic variants within the microRNAs, two within miR-329-2 and two within miR-429. Sanger sequencing of genomic DNA replicated these findings and confirmed the altered methylation status of the CpG sites identified by the EWAS in bisulphite-treated DNA. This investigation successfully fine-mapped the genetic sequence around key microRNAs. Variants have been detected which may affect their methylation status and methylated CpG sites have been confirmed. Additionally, we explored both the fidelity of next generation sequencing analysis and the potential efficacy of cell-line transformed DNA samples in place of finite patient samples in discovery genetic and epigenetic research. Electronic supplementary material The online version of this article (10.1186/s13104-018-3872-x) contains supplementary material, which is available to authorized users.


Introduction
Diabetic kidney disease (DKD) is a major complication of diabetes mellitus (DM) and the most common cause of chronic kidney disease (CKD) worldwide [1][2][3]. DKD is also associated with an increased risk of cardiovascular mortality [4,5]. DKD has a complex aetiology, yet individual risk is greatly influenced by genetic predisposition [6].
Advances in next generation sequencing (NGS) technologies and analytical approaches have resulted in more cost-effective sequencing [7], accelerating the rate of genetic research [8]. However, NGS costs are still prohibitive for many laboratories, limiting its utility in largescale studies of the methylome using high-density arrays [9,10].
Epigenetic modifications influence both DNA and RNA regulation without altering the underlying sequence and Open Access BMC Research Notes *Correspondence: laura.smyth@qub.ac.uk; lsmyth42@qub.ac.uk Genetic Epidemiology Research Group, Centre for Public Health, Queen's University of Belfast, Belfast, UK may contribute to the inherited predisposition of DKD [17,18]. DNA methylation is significantly altered in DM with higher levels of methylation reported in individuals with DKD [19]. MicroRNAs (miRNAs) are small highly conserved non-coding RNA molecules that act as epigenetic modifiers in the regulation of many protein-coding genes [20,21] and gene expression [22]. MiRNAs play a vital role in many diseases [23].
Induction of miRNAs in renal cells is associated with accumulation of extracellular matrix proteins implicated in kidney fibrosis and glomerular dysfunction [21]. Several miRNAs have been reported previously in association with DKD including miR-135a [24], miR-200b [25] and miR-377 [26]. MiRNAs may represent biomarkers for this disease but further mechanistic studies are required to elucidate their effects.
This study compared sequencing approaches to investigate differentially methylated miRNAs associated with DKD identified from an epigenome-wide association study (EWAS). The aims were to determine genetic variants and epigenetic marks in the miRNAs associated with DKD and their surrounding sequences, and to perform a direct comparison of the results between blood-derived genomic DNA (gDNA) and DNA from Epstein-Barr virus transformed cell-lines derived from the same participants. This provided an opportunity to evaluate the more readily available transformed cell-line DNA samples as a proxy for the finite supply of gDNA.

Sample cohort
All participants were of Caucasian ancestry from the UK or ROI and provided written informed consent for research. DNA was extracted from whole blood using the salting out method, normalised following PicoGreen quantitation, and frozen in multiple aliquots. Cell-line DNA was obtained following Epstein-Barr virus transformation of participants' lymphocytes into cell lines performed by the European Collection of Authenticated Cell Cultures (ECACC) [27].
Participants were part of the All Ireland-Warren 3-Genetics of Kidneys in Diabetes (GoKinD) UK Collection. Cases (n = 150) were defined as individuals with ≥ 10 years duration of type 1 diabetes (T1D) who had also been diagnosed with DKD defined as hypertension (blood pressure ≥ 135/85 mmHg) and persistent macroalbuminuria (≥ 500 mg/24 h). Diabetic controls (DCs, n = 100) were individuals with ≥ 15 years duration of T1D and no evidence of renal disease on repeat testing. Control subjects all had an estimated glomerular filtration rate (eGFR) > 60 mL/min/m 2 whereas each case subject had CKD based on presence of persistent macroalbuminuria and eGFR < 60 mL/min/m 2 . Participant characteristics are included within Additional file 1: Table S1.
To assess the methylation status of the cytosinephosphate-guanine (CpG) sites, the Infinium Human Methylation 450K BeadChip array was used following the manufacturer's instructions. Cases and controls were randomly distributed across each array. This high throughput platform evaluated individual methylation levels (β values) for each CpG site, ranging from 0 for unmethylated to 1 for complete methylation. Raw methylation data was adjusted for dye bias and quantile normalised as previously reported [28]. Quality control (QC) included evaluation of the bisulphite treatment conversion efficiency, dye specificity, hybridisation, staining and the inclusion of 600 integral negative controls for the EWAS.
The significant methylation values between cases and controls for all probes which passed QC were adjusted for multiple testing using the Benjamini and Hochberg approach [29]. All miRNAs that demonstrated significantly altered levels of DNA methylation (p < ×10 −5 ) were selected from our previous EWAS [28] for this validation and fine-mapping study.

NGS: targeted DNA sequencing
Targeted NGS analysis was performed for the sequences surrounding the CpG site of interest for each miRNA. Blood-derived gDNA was analysed in 23 DKD cases and 23 DCs. Participant characteristics are included within Additional file 1: Table S2. The gDNA samples were matched to the GoKinD cell-line DNA samples from which they were originally transformed, therefore analysis was conducted for 92 samples for each genomic region.
Target sequences for the five miRNAs were amplified using custom designed primers via a polymerase chain reaction (PCR). DNA fragments were pooled by sizes of approximately 800 base pairs (bp), 400 bp and 200 bp. Optimal primers were designed using Primer3Plus, Vector NTI Advance ® (Invitrogen ™ , USA) and EpiDesigner software. Primers were selected depending on their ability to sufficiently cover the CpG site of interest and have compatible annealing temperature to enable multiplex reactions. Primer sequences are provided in Additional file 1: Table S3 with optimised PCR conditions.
The library preparation was conducted using two Thermo Fisher Scientific protocols. The Ion Xpress ™ Plus gDNA Fragment Library Preparation protocol (MAN0009847, revision B.0) was employed where the initial fragments were approximately 800 bp as they required fragmentation using the E-Gel ™ SizeSelect ™ 2% Agarose Gel to generate 200 bp libraries. For fragments originally of 200 bp and 400 bp, the Prepare Amplicon Libraries without Fragmentation Using the Ion Plus Fragment Library Kit protocol (MAN0006846, revision A.0) was followed.
Following library preparation, the DNA samples were diluted to 26 pM using DNA-free water. The 400 bp libraries were enriched using Thermo Fisher Scientific's Ion OneTouch Both the Ion 316 ™ Chip v2 and the Ion 318 ™ Chip v2 were used to sequence the DNA samples using the Ion PGM ™ System (Thermo Fisher Scientific), before the raw data was analysed using Torrent Suite ™ Software v4.0.4 and Partek ® Genomics Suite ® 6.6 software (Partek ® Inc., USA). The sequencing reads were aligned to the hg19 reference sequence. SNPs were aligned to dbSNP version 141 and annotated using RefSeq version 2014-07-30. The chromosome viewer was used to visualise the overall sequencing coverage for the region of interest surrounding the top-ranked CpG site for each miRNA.

Sanger sequencing: fine mapping and methylation analysis
Forty-six gDNA samples, 23 DKD cases and 23 DCs, were bi-directionally Sanger sequenced using the ABI 3730 Genetic Analyser (Thermo Fisher Scientific). This was completed to enable direct comparisons to be drawn against the NGS variant calls.
Bisulphite treatment of the same samples was performed using the EZ-96 DNA Methylation ™ Lightning Kit prior to Sanger sequencing. The resulting data provided the opportunity to assess the methylation status of each CpG site within the fragment.
ContigExpress, a component of Vector NTI Advance ® 11.5.2 was used to analyse the Sanger sequencing data and determine accurate SNP calls. DNA sequences were aligned to the GRCh37 reference genome obtained from online resource, Ensembl.
An overview of the analysis workflow is illustrated in Fig. 1.

Discovery 450K methylation analysis
Methylation status was quantitatively determined (DKD cases n = 150 and DCs n = 100). QC showed that > 99% concordance was observed between all included individuals; r 2 > 0.98 for each of the sample pairs assessed. In total, 74 CpG sites were determined from the EWAS, five of which were identified with significantly altered β levels from the original EWAS protocol [28]; miR-141, miR-329-2, miR-34A, miR-429 and miR-940 (Additional file 1: Table S4). This manuscript is focused on validation and fine-mapping of these top-ranked miRNAs in individuals with and without DKD.

NGS: targeted DNA sequencing
Targeted NGS was performed using the Ion PGM ™ for DNA extracted from both whole blood and cell-line DNA. Both the Ion 316 ™ Chip v2 and the Ion 318 ™ Chip v2 were used in this analysis which typically generated 1.6 million to 3 million reads (Additional file 2: Figure S1).
Analysis was completed using Torrent Suite ™ Software and Partek ® Genomics Suite ® . Four SNPs were identified which have not previously been associated with DKD, or identified as top-ranked results in GWAS (Fig. 2); two within miR-329-2 (rs141067872 and rs10132943) and two within miR-429 (rs7521584 and rs112695918). The frequency distributions for these SNPs are included in Additional file 1: Table S5.

Sanger sequencing: fine mapping and methylation analysis
To confirm variants identified by NGS, the same primer pairs were used to bi-directionally Sanger sequence matched gDNA samples (23 DKD case and 23 DCs). The variants identified by the NGS approach were confirmed by Sanger sequencing (Fig. 3). The genotype and minor allele frequencies (dbSNP, HapMap-CEU, low coverage panel) determined are detailed in Additional file 1: Table S5, though it is essential to note that not all fragments for all samples were Sanger sequenced successfully.
The gDNA DKD and DC samples were also BST in order to assess the methylation status of each CpG site present within the fragment as reported by the 450K methylation array using ContigExpress software. In all, 35 methylation sites were identified for these miRNAs following bi-directional sequencing (Additional file 1: Table S6).

Discussion
This study reports the novel association of five miR-NAs with DKD, performing validation following the published EWAS and fine-mapping on these miRNAs.
Additionally, different sequencing approaches were evaluated to define the genetic and epigenetic architecture of sequences surrounding miRNAs associated with DKD. Comparative analysis between Sanger sequencing and NGS technologies confirmed a 100% concordant call rate for all SNPs identified by both techniques, for duplicate samples, providing reassurance that the original gDNA sequence for miRNAs was unaltered by the cell-line transformation process. Several additional studies have also reported this positive comparison between Sanger and NGS approaches [30][31][32][33]. Notably, this is the first report to return results using semiconductor sequencing chemistry and Ion Torrent platforms for top-ranked miRNAs identified from an EWAS. Despite being the gold-standard method, Sanger sequencing is not faultless and has been shown to be inefficient in confirming NGS results for regions with high GC content, and repetitive sequences [31]. NGS methods are reported to be more sensitive and scalable than Sanger sequencing [30,33,34].    Regarding DNA methylation, the BST DNA Sanger sequencing analysis mirrored the methylation sites identified through the Infinium Human Methylation 450K analysis. It is advisable to use at least two methods to detect and confirm differential methylation status [35]. Of the five main methods of detecting differential methylation, three were not employed in this study; (1) immunoprecipitation of methylated DNA, (2) methylated DNA capture by affinity purification and (3) reduced representation bisulphite sequencing [36]. The bisulphite-based methods, of which two were employed here, performed optimally in comparison to the others [36].
In conclusion, differential methylation in the five topranked miRNAs is associated with DKD and we have provided new details on the genetic architecture surrounding these loci. Targeted NGS compared favourably with Sanger sequencing. Sanger sequencing is costly and time-consuming when assessing many variants, or samples. Targeted NGS provides a robust alternative method, offering more cost-effective and often more sensitive approach.

Limitations
A potential limitation is that the sequencing data generated with the Ion PGM ™ Template OT2 400 Kit was not of as high quality as the Ion PGM ™ IC 200 Kit. Fragments of 400 bp in length had to be prepared and enriched using both the OT2 and ES, not the Ion Chef ™ due to chemistry incompatibilities at the time this experiment was undertaken (2014-2015). Both miRNAs with 400 bp fragments, miR-34A and miR-940, could have primers re-designed to facilitate 200 bp fragments covering the region of interest to provide better coverage of these regions.

Additional files
Additional file 1: Table S1. Characteristics of the individuals present within the HumanMethylation 450K BeadChip array analysis. Table S2. Characteristics of the individuals included within the sequencing analysis. Table S3. PCR sequences and conditions for miRNA sequencing. Table S4. Details of the top-ranked miRNAs identified from 450K Illumina methylation analysis. Table S5. SNP results from the fine mapping (gDNA) Sanger sequencing analysis. Table S6. Genomic and bisulphite treated DNA sequences for selected miRNAs.
Additional file 2: Figure S1. A summary the NGS Ion PGM ™ sequencing statistics. a) the sequencing chip loading density, b) the sequencing read lengths presented as a histogram, c) the alignment percentage of sequencing reads to hg19, d) additional sequencing statistics including chip loading, enrichment percentage, comparison of clonal and polyclonal reads, and the percentage of the final library which met the quality threshold for sequencing.
Authors' contributions LJS, APM, GJM and AJM conceived the study and participated in its design and coordination. LJS drafted the manuscript. KAB and JK assisted LJS with the next generation sequencing. All authors read and approved the final manuscript.