DNA methylation and expression analyses reveal epialleles for the foliar disease resistance genes in peanut (Arachis hypogaea L.)

Objective Low DNA sequence polymorphism despite enormous phenotypic variations in peanut indicates the possible role of epigenetic variations. An attempt was made to analyze genome-wide DNA methylation pattern and its influence on gene expression across 11 diverse genotypes of peanut. Results The genotypes were subjected to bisulfite sequencing after 21 days of sowing (DAS). CHG regions showed the highest (30,537,376) DNA methylation followed by CpG (30,356,066) and CHH (15,993,361) across 11 genotypes. The B sub-genome exhibited higher DNA methylation sites (46,294,063) than the A sub-genome (30,415,166). Overall, the DNA methylation was more frequent in inter-genic regions than in the genic regions. The genes showing altered methylation and expression between the parent (TMV 2) and its EMS-derived mutant (TMV 2-NLM) were identified. Foliar disease resistant genotypes showed significant differential DNA methylation at 766 sites corresponding to 25 genes. Of them, two genes (Arahy.1XYC2X on chromosome 01 and Arahy.00Z2SH on chromosome 17) coding for senescence-associated protein showed differential expression with resistant genotypes recording higher fragments per kilobase of transcript per million mapped reads (FPKM) at their epialleles. Overall, the study indicated the variation in the DNA methylation pattern among the diverse genotypes of peanut and its influence of gene expression.


Introduction
Peanut (Arachis hypogaea L. 2n = 4x = 40) is an important legume food and oilseed crop world-wide. The breeding efforts mainly focus on productivity, disease and insect resistance, oil quality etc. Genomics-assisted breeding has been successfully employed [1,2] with the development of the genomic resources including the genome sequence of the cultivated allotetraploid peanut [3,4]. Apart from the genome, the epigenome also influences the gene function and the phenotype [5]. Despite the narrow genetic base, peanut shows enormous phenotypic variability, which probably hints at the possible role of epigenetic variations in altering the phenotype.
Though the putative genes coding for cytosine-5 DNA methyltransferase (C5-MTases) and demethylase mediating the DNA methylation pattern have been identified in the progenitors of peanut [21], the pattern of DNA methylation in the cultivated peanut is yet to be analyzed. In this study, an effort was made to analyze the DNA methylome and its influence on transcriptome of the peanut genotypes contrasting for the productivity traits and the response to foliar diseases with an objective of identifying the epialleles contributing for the desirable phenotype in peanut so that an efficient breeding programme can be devised for such traits.

Methods
A total of 11 genotypes (Table 1) were employed for methylome sequencing. Seeds of these genotypes were sown in the pots and the seedlings were grown in the greenhouse with optimal conditions. Leaf samples were collected for DNA and RNA isolation from a single 21-day old seedling. DNA was isolated using Qiagen DNeasy Plant Mini Kit (Cat # 69104) and the RNA was isolated using Qiagen RNeasy Plant Mini Kit (Cat # 74904). Bisulfite treatment was done using Zymo EZ DNA Methylation-Gold Kit. DNA methylome library was constructed using illumine TruSeq ® DNA Methylation Kit. RNA library was constructed using Illumina TruSeq ® Stranded mRNA Kit. The quality of the libraries was checked using TapeStation and Qubit. DNA sequencing was carried out using Illumina Hiseq 2500, and the RNA sequencing was carried out using Illumina Hiseq 2500 and Illumina Hiseq 4000 with two technical replicates and without any biological replicates.

Methyl-Seq analysis
Raw fastq files were pre-processed using Adapter-Removal v2 [22] tool. Using bwa-meth [23] program, the preprocessed reads were aligned with the Arachis hypogaea reference genome downloaded from Peanut-Base [24]. The genomic sites showing DNA methylation were identified using MethylDackel program. Differential methylation was analyzed using methylKit [25] R package. The DNA methylation pattern was compared across the genotypes at q-value cutoff 0.01 and methylation percentage change cutoff 25 using methylKit.

RNA-Seq analysis
Raw data pre-processing was done using AdapterRemoval v2 [22] tool. The pre-processed reads were aligned with silva database using bowtie2 [26] to remove any ribosomal RNA. The reference genome of the Arachis hypogaea was downloaded from PeanutBase [24] and employed for alignment using STAR aligner [27] to get the alignment bam files. Differential expression analysis was executed using cuffdiff program of cufflinks package [28]. log2 fold change cutoff ± 2 and P-value cutoff 0.05 were used for identifying differentially expressed genes. Custom scripts were developed to identify genes that were differentially methylated as well as differentially expressed.

qRT-PCR
Expression of the selected genes were confirmed using qRT-PCR. Total RNA was isolated from the young leaves on 21 DAS using Qiagen Rneasy mini kit (Qiagen, USA), and the quantity was checked using NanoDrop spectrophotometer (ND-2000, Thermo fisher scientific, USA). cDNA was synthesized using Affinity Script qPCR cDNA

Results and discussion
On an average, 127,852,977 bisulphite sequencing reads were generated for each sample (Additional file 1:  [29,30] that the DNA methylation in plants is found both in CpG and non-CpG (CHG and CHH, where H is A, C or T) contexts in contrast to mammals where DNA methylation occurs predominantly at CpG dinucleotides.
Among the 11 genotypes, JL 24 and TMV 2 showed the highest (82,137,767) and the lowest methylation sites (69,044,110), respectively (Table 3). Such a natural epigenetic variation was also observed among the different ecotypes of Arabidopsis [31]. Many of the sites were found to be conserved across for DNA methylation across the genotypes of peanut. A total of 5,379,101 sites showed DNA methylation across all the 11 genotypes.
On an average, inter-genic regions (70,464,637 sites) were more prone for DNA methylation than the genic regions including 2 kb upstream and 2 kb downstream regions (6,422,166 sites) ( Table 3). Within the genic regions, the introns (1,590,263) showed a greater number of DNA methylation sites than the exonic regions (971,274). The 2 kb upstream and 2 kb downstream regions had 3,860,629 DNA methylation sites, indicating higher proportion of DNA methylation at the upstream and downstream regions than the gene body region. The distribution of DNA methylation within the genome especially in the promoter and gene body regions is very important as it influences the gene expression [32].
Of  Table S4). Arahy.0DU9MH with the highest DNA methylations sites did not show any expression at 21 DAS in the leaves of the 11 genotypes. Fifty genes with wide range of FPKM across the genotypes were selected and checked for the DNA methylation. Arahy.FHUH7B on chromosome 10 showing the highest FPKM of 54,951 had a maximum of 102 DNA methylation sites (Additional file 5: Table S5). Many of these genes showed negative association between the number of DNA methylation sites and FPKM among the genotypes.
Fourteen C5-MTase coding genes and ten DNA demethylase coding genes identified in the diploid peanut earlier [21] were analysed for DNA methylation and expression. A considerable variation was observed for methylation across the genes, however, not much variation was observed for methylation across the genotypes (Additional file 6: Table S6). A DME-like A gene Arahy. R549UJ (Aradu.4D5YM) of 15,531 bp length on chromosome 8 showed the highest number of methylation (as high as 586). Forty-four sites were found in the promoter region, while 542 sites were in the gene body (48 in exon and 494 in intron). This gene did not show any expression at 21 DAS in the leaves of the 11 genotypes.
An attempt was made to enumerate the differentially methylated sites between a parent (TMV 2) and its EMS-derived mutant (TMV 2-NLM). The two genotypes significantly differed for 650 methylation sites, of which 240 and 401 were found in the A and B genome (remaining nine on the scaffolds), respectively. Again, the inter-genic region showed a greater number of DNA methylated sites (605) than the genic regions (45; 23 in exons and 22 in introns). Thirty-seven genes exhibited differential methylation, of which eight showed differential expression (Additional file 7: Table S7a), indicating the influence of EMS mutagenesis on DNA methylation.
In an attempt to identify the differentially DNA methylated sites, foliar disease resistant (GPBD 4, VG 9514, ICGV 86855, ICGV 99005 and ICGV 86699) and susceptible (TAG 24, TMV 2 and JL 24) groups of genotypes were constructed. The common sites within susceptible group were compared with the common sites within the resistant group. In total, 766 sites showed significantly differential DNA methylation. Of these, 331 sites were in the A genome and 433 sites were in the B genome. In total, 731 methylation sites were in the inter-genic regions and 35 were in the genic regions (19 in exons and 16 in introns). Interestingly, four differentially DNA methylated sites (1,001,785, 1,001,813, 1,021,671 and 1,305,680) mapped to the QTL region (for LLS) on A02 and one (134,350,159) mapped to the QTL region (for rust) on A03 [33]. Of these sites, only one (1,305,680) was in a genic (Arahy.42YDET) region. However, this gene has not been regarded as a candidate gene for foliar disease resistance [33]. Based on the genomic position of the DNA methylation sites, 25 genes were found to be differentially methylated (q ≤ 0.01) between resistant and susceptible genotypes. Of these genes, two genes (Arahy.1XYC2X on chromosome 01 and Arahy.00Z2SH on chromosome 17) coding for senescence-associated protein showed differential expression with resistant genotypes recording higher FPKM values (Additional file 7: Table S7b). It was interesting to note the methylation pattern within Arahy.1XYC2X differed between the resistant and susceptible groups, indicating the epialleles at this locus. The candidate genes identified for late leaf spot (four genes) and rust (six genes) resistance in the previous study [33] did not show any DNA methylation, indicating that breeding for foliar disease resistance can depend only on the genetic variation. FPKM values for the transcripts at these loci were on par between the resistant and the susceptible genotypes. This was also confirmed by the qRT-PCR where some of these genes showed non-significant fold changes between the two groups (Additional file 8: Table S8).

Limitations
A detailed genome-wide DNA methylation pattern was reported from the 11 diverse genotypes of peanut for the first time by analyzing the leaf samples collected at 21 DAS. Also, the influence of induced mutagenesis on the pattern of DNA methylation was assessed. Differentially methylated sites were identified between the foliar disease resistant and susceptible genotypes. The genes showing differential methylation and expression were identified. However, these data need to be validated by comparing the samples collected at different stages of growth and development and varying conditions (diseased and normal) to identify the role of DNA methylation, and its influence on gene expression in peanut.