A seed germination transcriptomic study contrasting two soybean genotypes that differ in terms of their tolerance to the deleterious impacts of elevated temperatures during seed fill

Objective Soybean seed development is negatively impacted by elevated temperatures during seed fill, which can decrease seed quality and economic value. Prior germplasm screens identified an exotic landrace able to maintain ~ 95% seed germination under stress conditions that reduce germination dramatically (> 50%) for typical soybean seeds. Seed transcriptomic analysis was performed for two soybean lines (a heat-tolerant landrace and a typical high-yielding adapted line) for dry, mature seed, 6-h imbibed seed and germinated seed. Seeds were produced in two environments: a typical Midwestern field and a heat stressed field located in the Midsouth soybean production region. Results Transcriptomic analysis revealed 23–30K expressed genes in each seed tissue sample, and differentially expressed genes (DEGs) with ≥ twofold gene expression differences (at q-value < 0.05) comprised ~ 5–44% of expressed genes. Gene ontology (GO) enrichment analysis on DEGs revealed enrichment in heat-tolerant seeds for genes annotated for general and temperature-specific stress, as well as protein-refolding. DEGs were also clustered in modules using weighted co-expressed gene network analysis, which were examined for enrichment of GO biological process terms. Collectively, our results provide new and valuable insights into this unique form of genetic abiotic stress tolerance and to soybean seed physiological responses to elevated temperatures. Electronic supplementary material The online version of this article (10.1186/s13104-019-4559-7) contains supplementary material, which is available to authorized users.


Introduction
Soybean (Glycine max L. Merr.) is a major commodity crop, comprising ~ 34% (~ 36.5 million ha) of crop land in the United States in 2017 (http://www.soyst ats.com, accessed 3-21-19). The value of the crop is principally derived from the high yield and quality of oil and protein in the seeds. The Midsouth soybean growing region of the United States experiences consistent late-season drought, which has resulted in historically reduced onfarm seed yield and economic return [1,2]. Although irrigation can at least partially remedy these issues, fuel for pumping water is expensive and long term use of aquifers for agricultural irrigation may be unsustainable [3].
Traditionally, soybean maturity group (MG) 5-7 cultivars were planted in May and June with harvest in October and November. An alternative method, the Early Soybean Production System (ESPS) modifies soybean planting and harvest dates to avoid much of the late season endemic drought [2] by use of cultivars Open Access BMC Research Notes *Correspondence: Jason.Gillman@ars.usda.gov 1 USDA-ARS, Plant Genetics Research Unit, 205 Curtis Hall, University of Missouri, Columbia, MO 65211, USA Full list of author information is available at the end of the article that flower and mature earlier (typically MG 3-4 and early 5 as compared with MG5-7) and by adjusting planting dates to early-to-mid-April with harvest typically occurring in September. The practices of the ESPS in the Midsouth region has increased seed yield and on-farm return on investment [1,3] under both irrigated and non-irrigated conditions [1].
Soybean has traditionally been considered to be heat-tolerant, with a vegetative optimum temperature of ~ 30 °C [4]. However, the processes of pollination and seed growth/maturation are sensitive to elevated temperatures; the reproductive optimum is a relatively low 22-24 °C [5]. Despite economic and seed yield gains under the ESPS, seed produced in this system are exposed to much higher temperatures [6] during seed fill (≥ 32 °C maximum daytime temperature) than seeds of MG 5-7 cultivars produced in the traditional system. In typical MG 4 cultivars, exposure to such high temperatures during seed development reduces seed quality/germination, increases pathogen infection, and often results in economic loss through seed dockage [3,7].
Soybean is a self-pollinating species, and modern high-yielding cultivars derive from an extremely limited genetic base; traditional breeding has exacerbated this problem [8,9]. Exotic landraces may contain novel disease and stress resistance genes; a successful screen identified lines that can tolerate the high temperature associated with the ESPS [6]. An unimproved landrace (PI 587982A) has consistent and robust resistance (> 90% germination, near absence of Phomopsis longicolla infection). The first United States heat tolerant germplasm release, with tolerance derived from PI 587982A, was recently made by our group [10].
Transcriptomics, enabled by advances in DNA sequencing and computation, is a powerful tool to identify gene expression differences and correlations with genetic/developmental cues or environmental conditions. Detailed studies have generated "transcriptomic atlases" for soybean gene expression [11][12][13][14][15]. However, studies have ignored soybean seed germination, in favor of seed development or vegetative tissues (typically leaves or roots). In this study, we examined three soybean seed germination stages: (1) dry, mature seed; (2) imbibed seed; and (3) germinated seed and contrasted two soybean genotypes which differ in their tolerance to the impact of elevated temperature on seed quality, using seed produced in two environments differing in abiotic stress: (A) a lower temperature, Midwest location; and (B) the high temperature conditions of the ESPS.

Field seed production, seed imbibition and germination measurement, and RNA isolation and RNA sequencing, mapping and statistical analysis
Full details are provided in Additional file 1.

Whole genome comparative network analysis and gene ontology enrichment of co-expressed gene modules
Modules of genes with highly correlated expression patterns were described using weighted gene coexpression network analysis (WGCNA). We expect these modules to correspond to networks of genes that are co-expressed and thus interact and share biological processes. We constructed unsigned weighted gene coexpression modules using the WGCNA [16] package in R. The blockwiseModules function was run with the Pearson correlation coefficient and a soft thresholding power of 18. The resulting genes modules were named by assigning them different colors arbitrarily. Additionally, we further analyzed each module by conducting significant associations for Gene Ontology (GO) function annotations enrichment analysis (Additional file 5) and used hierarchical clustering to group differentially expressed genes across samples (Additional file 2). g:GOSt (https ://biit.cs.ut.ee/gprofi ler/gost) was used to examine modules detected by WGCNA, in order to detect statistically significant enriched GO terms within specific modules, using the Benjamin-Hochberg FDR method at α = 0.05 as significant.

Germination assays
We examined germination kinetics for two soybean genotypes: (1) a heat-tolerant soybean plant introduction line (PI 587982A) henceforth referred to as "PI"; and (2) S99-11986, a conventional high yielding improved line [20], comparable to cultivars commonly grown in the Midsouth and Midwest regions, henceforth referred to as "SG". Seed to be germinated were produced (Fig. 1a) in one of two environments: (1) a location with endemic high temperature stress associated with the Early Soybean Production System (henceforth refered to as ESPS-Stoneville, MS; Fig. 1a); or (2) a less stressful Conventional Soybean Production System (CSPS-Columbia, MO; Fig. 1a).
Seed of the PI line were found to germinate much more rapidly than those of the SG line in both environments (Fig. 1b), and PI seed from both unstressed and heat-stressed locations germinated with very high efficiency (> 80%, Fig. 1a). In contrast, only 75% of CSPSproduced seed from SG germinated by the end of 72 h. A dramatic reduction in germination was noted for SG seed produced under the heat-stress of the ESPS (~ 30% germination at 72 h, Fig. 1b). Our germination results are concordant with our previous metabolic study [21]. We then selected three stages (Fig. 1b, Table 1) to obtain transcriptomic data: (1) mature, dry seed; (2) 6-h imbibed seed; and (3) germinated seed with emerged radicle for each genotype grown in both environments (Fig. 1b, Table 1). It is important to note that the time from imbibition to germination varied between genotype/environments (Fig. 1b). Three biological replicates (each consisting of 5 seed) per genotype/condition/ timepoint were used for analysis to quantify gene expression. The number of genes expressed (FPKM > 0.3) in each sample ranged from 23,560 to 30,349 (Fig. 1c, Table 1).
A core set of genes expressed was identified: (A) 21,082 in all mature seed tissues; (B) 26,372 genes expressed in all 6 h imbibed seed tissues; and (C) 21,843 genes in all germinated seed tissues (Fig. 2a, Additional file 3).

Differential expressed gene analysis
We utilized a Tuxedo RNAseq analysis pipeline to make 20 distinct comparisons, which can be divided into four general categories: (1) environmental effects; (2) genotypic effects; (3) the transition between mature seeds to 6-h imbibed seeds; and (4) the transition between imbibed seeds to germinated seeds (Table 1, Additional  file 4). An average of 7385 differentially expressed genes (DEGs) were detected between environments (threshold for all comparisons was q-value < 0.05). An average of 7789 DEGs were detected between genotypes. An average of 11,833 DEGs were detected between mature and 6-h imbibed seeds, across genotypes and environments (Fig. 1d). Lastly, an average of 13,344 DEGs were detected between imbibed and germinated seeds (Fig. 1d, full details are presented in Additional file 4).

Gene ontology enrichment
We utilized a gene ontology (GO) term enrichment tool (https ://soyba se.org/gosli mgrap hic_v2/dashb oard.php) to examine lists of differentially expressed genes for the 20 comparisons (Figs. 1d, 2b, c, Additional file 5). A larger number of DEGs were found in comparisons of developmental transitions (from mature-to-imbibed or imbibed-to-germinated seeds) than within the same developmental stage (either between environments or between genotypes). Seed development/maturation under the high temperature conditions of the ESPS (Figs. 1d, 2b, Additional file 5) was associated with significant enrichment (at a threshold p-value < 0.05) for gene annotations for heat stress, response to oxidative stress and protein folding; "response to hydrogen peroxide", "response to high light intensity", and "response to heat" were both overrepresented and GO-term enriched in all 6 comparisons for environmental effects, with "response to wounding" overrepresented and GO term enriched in 4/6 comparisons (excepting comparisons #2, #4; comparison numbers specified in Additional file 5). The enrichment of numerous GO terms associated with abiotic stress response gives a clear indication that the mRNA pools of both genotypes are responsive to the higher temperatures of the ESPS as compared to the less stressful CSPS.
Despite this environmental response, seed mRNA pools of the heat-tolerant line were further enriched (Fig. 2b, Additional file 5) for genes with GO-terms associated with abiotic stress response [e.g. "response to high light intensity", "response to hydrogen peroxide" and "response to heat" in 5/6 comparisons (excepting #9)" and "response to water deprivation" in 4/6 (excepting #7, #9); "response to cadmium ion" in 4/6 comparisons (excepting #9, #11); "response to salt stress" in 4/6 comparisons (excepting #9, #10)]. In addition we observed enrichment in the tolerant PI mRNA pools for protein refolding-associated GO terms: "nucleosome assembly" in 4/6 comparisons (excepting #7, #11) and "response to endoplasmic reticulum stress. The GO term "l-ascorbic acid biosynthesis" was also observed to be enriched in seed of the stress tolerant PI under the ESPS; these results are concordant our previous metabolomics study [21], which conclusively demonstrated that higher levels of ascorbate precursors were found in seeds of a heat-tolerant soybean line. Collectively, these results suggest fundamental differences exist between seed mRNA pools between the two genotypes; the more stress tolerant PI genotype is effectively "genetically primed" to more effectively manage abiotic stress as well as for higher levels of seed antioxidant compounds. This mRNA priming trend persists through seed germination and ultimately biologically translates to more efficient and effective seed germination (and in field conditions seedling emergence).

Weighted gene co-expression network analysis
Weighted gene co-expression network analysis (WGCNA) is a systems biology method for describing the correlation patterns among genes across samples [16]. We utilized the WGCNA package in R on FPKM data of all samples to find modules (clusters) of highly correlated expressed differential genes (≥ twofold) and a total 16 clusters were detected (Additional files 6 and 7; clusters are color coded).
Co-expressed gene clusters were then examined for overabundance of GO Biological Process terms (Additional file 8, GO Molecular Function are also provided in Additional file 9). For brevity, only Biological Process results will be discussed here. For 6/16 GO:BP clusters no significantly enriched GO terms were found (salmon, pink, purple, midnightblue, magenta, black). Several gene clusters were enriched for gene expression/chromatin remodeling (Yellow, GreenYellow, Brown) for translation/ribosome components (Yellow, Blue), mRNA splicing (Cyan, Brown) Actin/Cytoskeleton (Red, Grey) for Cell wall/Carbohydrate metabolism (Turquoise, Red). Of particular interest is the Green gene expression module, which displayed enrichment for numerous GO:BP terms annotated for abiotic stress responses (e.g. response to temperature stimulus, response to reactive oxygen species, response to salt stress, response to heat, response to stress, etc.).

Validation of RNAseq data using qRT-PCR
Four differentially expressed genes (Additional file 10) were selected for qRT-PCR validation of the RNAseq data. Two genes were highly expressed (KTI-1, average 86 FPKM; HSP20, average 856 FPKM) and two were lower expressed genes (SAM-methyltransferase, average 31 FPKM; UDP-glycosyl transferase, average 4.0 FPKM). qRT-PCR were tested via the ΔΔCt method and expressed as log2 ratios (Additional file 11). Correlations between qRT-PCR and FPKM results were robust for mature (r 2 = 0.9729) and imbibed (r 2 = 0.9919), but less robust for germinated samples (r 2 = 0.6844). The high concordance between RNAseq and qRT-PCR highlights the high quality of our RNAseq dataset.

Conclusions
In this study we provide substantial new mRNA sequencing data that defines the very early stages of soybean seed germination (mature seed > imbibed seed > germinated seed). We also contrasted two genotypes which differ in terms of tolerance to high temperature stress during seed development, which were produced under two distinct temperature stress field locations. We demonstrate that the more temperaturetolerant PI genotype is primed at the mRNA level to handle higher levels of temperature stress. In addition, we demonstrate that the PI line has faster, more efficient and more effective seed germination regardless of seed production location/environmental stress. These results highlight some of the genetic gains possible by leveraging exotic soybean germplasm as sources of novel traits in soybean breeding programs.