- Research note
- Open Access
Fine-mapping of a putative glutathione S-transferase (GST) gene responsible for yellow seed colour in flax (Linum usitatissimum)
BMC Research Notes volume 15, Article number: 72 (2022)
The brown seed coat colour of flax (Linum ustiatissimum) results from proanthocyanidin synthesis and accumulation. Glutathione S-transferases (GSTs), such as the TT19 protein in Arabidopsis, have been implicated in the transport of anthocyanidins during the synthesis of the brown proanthocyanidins. This study fine mapped the g allele responsible for yellow seed colour in S95407 and identified it as a putative mutated GST.
We developed a Recombinant Inbred Line population with 320 lines descended from a cross between CDC Bethune (brown seed coat) and S95407 (yellow seed) and used molecular markers to fine map the G gene on Chromosome 6 (Chr 6). We used Next Generation Sequencing (NGS) to identify a putative GST was identified in this region and Sanger sequenced the gene from CDC Bethune, S95407 and other yellow seeded genotypes. The putative GST from S95407 had 13 SNPs encoding, including four non-synonymous amino acid changes, compared to the CDC Bethune reference sequence and the other genotypes. The GST encoded by Lus10019895 is a lambda-GST in contrast to the Arabidopsis TT19 which is a phi-GST.
Flax (Linum usitatissimum L.) has brown seeds although some consumers prefer the yellow seeded varieties that exist. Polymeric proanthocyanidins (PA, or condensed tannins) are responsible for the brown seed coat colour in many species , including flax. Mutations in the genes of the PA biosynthetic pathway may result in yellow seed colour in flax, Arabidopsis and other species [2,3,4,5,6]. For example, in Arabidopsis a mutated glutathione synthase (GST), tt19-1, cannot transport the colourless anthocyanidin quercetin-3-O-rhamnoside across the tonoplast membrane and, consequently, accumulation of PA in the vacuole does not occur [2, 7]. In flax five gene alleles (Y, b1, b1vg, d and g), each individually responsible for yellow (or mottled) seed colour, have been observed and their genetics partially elucidated , however, the functional and genetic identity of some of these genes has only recently been studied. The location and identity of the mutated D gene in cultivar Bolley Golden was determined to be a flavonoid 3′5′ hydroxylase on Chr2 [5, 6], and the dominant Y gene was found to be due to insertion of a transposon upstream of chalcone synthase (unpublished data). The mutated G gene was selected for fine mapping as it is one of the remaining known yellow seed coat coloured mutants and thought to be a single gene. It is not known if the b1 and b1vg mutants are different genes or allelic.
Flax has a haploid number of 15 and a genome size of ~ 380 Mbp. The reference sequence from CDC Bethune, was published first as scaffolds  and, more recently, as pseudomolecules . Genome-wide molecular markers covering the entire genome are available [11, 12].
Our objective was to fine map the G gene in flax using the yellow seed line S95407 developed at the University of Saskatchewan. Characterizing the g gene could assist breeding cultivars of yellow seeded flax.
Material and methods
Results and discussion
We mapped the location of the G gene first using Simple Sequence Repeat (SSR) markers and then performed fine mapping of the locus using Kompetetive Allele Specific PCR (KASP) markers. Initial analysis of the 193 SSR markers  indicated that 123 were polymorphic between CDC Bethune and S95407. Testing these polymorphic markers on pooled DNA from a subset of 10 brown seeded or 10 yellow seeded individuals identified 52 markers with an unequal distribution of alleles. Thirty of these markers, selected based on their distribution over the 15 flax chromosomes, were used to screen a subset of 94 individuals and the two parents (Additional file 4: Data S1). We determined that marker Lu442, on Chr6, was located ~ 30 cM from the G gene. Six other polymorphic markers on Chr6 were then used to screen the population, revealing that Lu69 was located ~ 20 cM the G gene (Fig. 1, Table 1 and Additional file 4: Data S1). Illumina HiSeq was used to resequence S95407 (archived at NCBI Sequence Read Archive SRR11869873), the reads trimmed using trimmomatic  and aligned against the CDC Bethune reference sequence  using bowtie2 . Refinement of the alignment, variant calling and filtering SNPs between S95407 and CDC Bethune was performed using samtools and bcftools . The script used to identify SNPs is available in the Additional file 1. KASP markers (KASP1-18) were designed against SNPs located distally from Lu69 in the region Chr6:11.65–17.86 Mbp. Lu69 is located at Chr6:10.96 Mbp. Markers KASP5 and KASP6 were 11.1 and 7.9 cM from the G gene, or at Chr6:15.07 Mbp and Chr6:14.84 Mbp, respectively (Fig. 1, Table 1 and Additional file 4: Data S1).
Markers spanning the region between KASP6 and Lu69 were developed (KASP 19–27) and mapped. KASP20 (on scaffold1491), KASP22 and KASP23 (both on scaffold618) were located approximately 4.5, 3.2 and 7.0 cM from the G gene, respectively (Table 1, scaffold information from phytozome-next.jgi.doe.gov/info/Lusitatissimum_v1_0). An additional marker approximately mid-way between KASP20 and KASP22 (KASP28) was developed to differentiate an SNP located ~ 250 kb from the distal end of scaffold1491. An additional 94 lines from the RIL were used to map the interval between KASP28 and the putative G gene (Additional file 4: Data S1). The S95407 allele for KASP28 segregated with all the 94 yellow seeded lines and only one of the 94 brown seed coat lines. Five High Resolution Melt (HRM) markers within 5 cM of the putative G gene (Table 1) were used to genotype the single brown seeded line with the yellow genotype. This individual was observed to have the yellow genotype for all five markers, indicating that it had been incorrectly phenotyped as a brown-seeded line.
Putative genes in the last 250 kb of scaffold1491 were identified from the CDC Bethune reference genome. This region corresponds to Chr6:13.5–13.8 Mbp, based on the pseudomolecule sequence published by You et al. . This region contains 55 putative genes, of which 28 had one or more SNPs in the coding sequences between CDC Bethune and S95407. This region also contained the KASP28 marker and was adjacent to scaffold618, which contained the KASP22 marker. A portion of one gene (Lus10019895) in this region, located 15 kb from KASP 28 was a putative glutathione S-transferase (GST), as identified using TBLASTX. GSTs play a role in transporting anthocyanins or proanthocyanidin in many tissues, including the seed coat [2, 4, 20]. Lus10019895 was located between Chr6:13.8–13.8 Mbp, based on the flax pseudomolecule sequences.
The last six exons of the putative gene Lus10019895 encode for a GST, with the first 14 exons encode a putative thylakoid integral membrane TerC protein (Additional file 2: Figure S1). The putative TerC protein shares 80% amino acid residue similarity with the Arabidopsis TerC The GST encoded by the last six exons of Lus10019895 is 1185 bp long, encoding a 738 bp CDS.
The sequence of the GST portion of Lus10019895 was determined by PCR amplifying this fragment from genomic DNA from brown seeded CDC Bethune and CDC Sanctuary and from yellow seeded, S95407, M96006 (B1vg gene), Crystal (B1 gene), G1186 (D gene) and YSED18 (Y gene) and then Sanger sequenced. The sequence of the PCR fragments were identical to the CDC Bethune reference sequence for all the genotypes except S95407 (See Additional file 5: Data S2). This data confirms the consensus sequence of Lus10019895 obtained from the S95407 NGS data obtained in this project. In the S95407, 13 SNPs were observed. Two SNPs were located in the 5′ UTR of the gene, two in the 3′UTR and three in proposed introns. A total of six SNPs were observed in CDS sequences, four of which were non-synonymous (Fig. 2A). These amino acid changes were T34I, A46S, T121A and F126Y. The conformation of the active site in the S95407 Lus10019895 GST may be disrupted by the A46S change, as this alanine is highly conserved, and/or the T34I substitution. The A46S change in S95407 may be particularly significant as it may result in significant alteration in the electrochemical conformation of the active site. An alternative explanation for the yellow seeded phenotype observed in S95407 is a reduction in Lus10019895 expression brought about by a 24 bp deletion in the 3′UTR, 658 bp downstream from the stop codon (not shown).
In the developing seed coat GSTs are thought to transfer glutathione onto anthocyanins or PA prior to transport into the vacuole. A GST mutant, tt19, is associated with the development of yellow seeds in Arabidopsis . GSTs are involved in the transport of anthocyanins and PA in the seed coat of grape . Homologues of TT19 are involved in the transport of anthocyanins in the petals of cyclamen  and petunia . The Lus10019895 GST shares share 71.7%, 74.2% and 66.0% similarity to three homologs from flax, Lus10003994, Lus10015049 and Lus10040347, respectively. Collectively, these genes share 67–71% similarity at the amino acid level to the Arabidopsis lambda-type GST proteins AtGSTL1, AtGSTL2 and AtGSTL3) (Fig. 2A), but only 19% identity and 33–37% similarity to AtGST26/TT19/AtGST phi12 (not shown). Three other flax GST proteins, Lus10023511, Lus10029815 and Lus10040393, had a much higher degree of similarity to AtGST26/TT19 (66%, 68% and 72%, respectively) (Fig. 2B).
Both lambda-GSTs and phi-GSTs are expressed in the seeds of Brassica napus , Vitus vinifera , Helianthus annuus  and Capsicum annuum . Anthocyanin transport into the vacuole is facilitated by multiple classes of GSTs in maize . Three out of four grape GSTs examined complement the function tt19 in Arabidopsis, albeit in different ways , so it is plausible that the Lus10019895 GST performs this function in maturing flaxseed, despite having less homology to AtGST26 than other GST homologues in flax. Interestingly, the Lus10019895 protein lacked the highly conserved cysteine at residue 43, in the active site of both lambda- and phi-type GSTs and had a serine instead (Fig. 2). The other flax GST proteins, except Lus10029815, still retained the cysteine at this site. Lus10019895 is more similar to non-lambda GSTs from other species (Additional file 3: Figure S2), which often have a serine residue rather than a cysteine at this position in the active site , than to phi-GSTs in other species [20, 23,24,25, 27]. The Lus10019895 GST protein has 76–78% similarity to the Citrus sinensis (XP006480546), Eucalyptus grandis (XP010047051), and Jatropha curcas (NP001295698) GSTs and shares a high degree of similarity with homologs from other species (Additional file 3: Figure S2). The Lus10019895 protein shares only 37% similarity with the petunia phi-type GST responsible for anthocyanin transport in petals, AN9 .
A BLAST search of flax ESTs in NCBI using the Lus10019895 CDS returned 10 hits, all from the mature embryo EST library (LIBEST_027001). The consensus sequences of both CDC Bethune and S95407 around Lus10019895 are provided in Additional file 5: Data S2.
We have identified, using molecular markers, bioinformatics and DNA sequencing, a putative GST involved in PA synthesis in the seed coat of flax. The putative GST is encoded in the last six codons of Lus10019895 which appears to be artefactually fused to a TerC gene. As many as 13 SNPs, including four non-synonymous changes, are observed in the yellow-seed coat coloured mutant, S95407, compared to the brown-seed coat coloured reference sequence from CDC Bethune. The Lus10019895 GST has a higher level of similarity to Lambda-type GSTs from Arabidopsis and other species than to phi-type GSTs such as the Arabidopsis TT19 and Petunia AN9.
The observation that Lus10019895 consists of two genes could be proven definitively using RT-qPCR, however, we assume that the TerC and GST genes are separate based on the arrangement of CDS and high level of similarity to homologs within the flax genome. We do not determine that the putative GST identified here is functionally responsible for brown seed coat colour in CDC Bethune, or that the mutant gene is the cause of the yellow seed coat colour in S95407.
Availability of data and materials
All data generated or analysed during this study are included in this published article and its Additional files. Resequencing data from S95407 is available at NCBI SRA SRR11869873.
Kompetitive allele specific PCR
Next generation sequencing
Recombinant inbred line
Simple sequence repeat
Dixon RA, Xie DY, Sharma SB. Proanthocyanidins—a final frontier in flavonoid research? New Phytol. 2005;165(1):9–28.
Kitamura S, Shikazono N, Tanaka A. TRANSPARENT TESTA 19 is involved in the accumulation of both anthocyanins and proanthocyanidins in Arabidopsis. Plant J. 2004;37(1):104–14.
Haughn G, Chaudhury A. Genetic analysis of seed coat development in Arabidopsis. Trends Plant Sci. 2005;10(10):472–7.
Appelhagen I, Thiedig K, Nordholt N, Schmidt N, Huep G, Sagasser M, et al. Update on transparent testa mutants from Arabidopsis thaliana: characterisation of new alleles from an isogenic collection. Planta. 2014;240(5):955–70.
Sudarshan GP, Kulkarni M, Akhov L, Ashe P, Shaterian H, Cloutier S, et al. QTL mapping and molecular characterization of the classical D locus controlling seed and flower color in Linum usitatissimum (flax). Sci Rep. 2017;7(1):15751.
Sudarshan GP, Kulkarni M, Akhov L, Ashe P, Shaterian H, Cloutier S, et al. Publisher correction: QTL mapping and molecular characterization of the classical D locus controlling seed and flower color in Linum usitatissimum (flax). Sci Rep. 2018;8(1):4567.
Akita Y, Kitamura S, Hase Y, Narumi I, Ishizaka H, Kondo E, et al. Isolation and characterization of the fragrant cyclamen O-methyltransferase involved in flower coloration. Planta. 2011;234(6):1127–36.
Mittapalli O, Rowland G. Inheritance of seed color in flax. Crop Sci. 2003;43(6):1945–51.
Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, et al. The genome of flax (Linum usitatissimum) assembled de novo from short shotgun sequence reads. Plant J. 2012;72(3):461–73.
You FM, Xiao J, Li P, Yao Z, Jia G, He L, et al. Chromosome-scale pseudomolecules refined by optical, physical and genetic maps in flax. Plant J. 2018;95(2):371–84.
Cloutier S, Ragupathy R, Miranda E, Radovanovic N, Reimer E, Walichnowski A, et al. Integrated consensus genetic and physical maps of flax (Linum usitatissimum L.). Theor Appl Genet. 2012;125(8):1783–95.
Kumar S, You FM, Cloutier S. Genome wide SNP discovery in flax through next generation sequencing of reduced representation libraries. BMC Genomics. 2012;13:684.
Cloutier S, Niu Z, Datla R, Duguid S. Development and analysis of EST-SSRs for flax (Linum usitatissimum L.). Theor Appl Genet. 2009;119(1):53–63.
Young L, Hammerlindl J, Babic V, McLeod J, Sharpe A, Matsalla C, et al. Genetics, structure, and prevalence of FP967 (CDC Triffid) T-DNA in flax. Springerplus. 2015;4:146.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30(15):2114–20.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9(4):357–9.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E, DePristo MA, et al. The variant call format and VCFtools. Bioinformatics. 2011;27(15):2156–8.
Li H. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics. 2011;27(21):2987–93.
Li H, Durbin R. Fast and accurate long-read alignment with Burrows-Wheeler transform. Bioinformatics. 2010;26(5):589–95.
Pérez-Díaz R, Madrid-Espinoza J, Salinas-Cornejo J, González-Villanueva E, Ruiz-Lara S. Differential roles for VviGST1, VviGST3, and VviGST4 in proanthocyanidin and anthocyanin transport in Vitis vinífera. Front Plant Sci. 2016. https://doi.org/10.3389/fpls.2016.01166.
Kitamura S, Akita Y, Ishizaka H, Narumi I, Tanaka A. Molecular characterization of an anthocyanin-related glutathione S-transferase gene in cyclamen. J Plant Physiol. 2012;169(6):636–42.
Tornielli G, Koes R, Quattrocchio F. The genetics of flower color. In: Gerats T, Strommer J, editors. Petunia: evolutionary, developmental and physiological genetics. New York: Springer New York; 2009. p. 269–99.
Wei L, Zhu Y, Liu R, Zhang A, Zhu M, Xu W, et al. Genome wide identification and comparative analysis of glutathione transferases (GST) family genes in Brassica napus. Sci Rep. 2019;9(1):9196.
Ma L, Zhang Y, Meng Q, Shi F, Liu J, Li Y. Molecular cloning, identification of GSTs family in sunflower and their regulatory roles in biotic and abiotic stress. World J Microbiol Biotechnol. 2018;34(8):109.
Islam S, Sajib SD, Jui ZS, Arabia S, Islam T, Ghosh A. Genome-wide identification of glutathione S-transferase gene family in pepper, its classification, and expression profiling under different anatomical and environmental conditions. Sci Rep. 2019;9(1):9101.
Alfenito MR, Souer E, Goodman CD, Buell R, Mol J, Koes R, et al. Functional complementation of anthocyanin sequestration in the vacuole by widely divergent glutathione S-transferases. Plant Cell. 1998;10(7):1135–49.
Dixon DP, Steel PG, Edwards R. Roles for glutathione transferases in antioxidant recycling. Plant Signal Behav. 2011;6(8):1223–7.
Gopalan Selvaraj provided interpretation and editing on an earlier version of the manuscript. Gordan Rowland developed the RIL population and gave it to Helen Booker. Shannon Froese and Kayla Lindenback provided technical assistance for the research. This research was enabled in part by support provided by WestGrid (www.westgrid.ca) and Compute Canada (www.computecanada.ca).
Funding for this work was provided by Genome Prairie’s Total Utilization of Flax Genomics (TUFGEN) project and Saskatchewan Ministry of Agriculture—Agriculture Development Fund Project #20100159 Genetic mapping of DNA markers of the different flax seed colour genes in RIL populations derived from crosses with CDC Bethune.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Detailed materials and methods.
Putative CDS structure of Lus10019895 and alignment with Arabidopsis TerC and GST proteins. A Lus10019895 is 4467 bp long and contains 20 CDS (yellow arrows). The first 14 CDS of the gene code for a thylakoid membrane protein, TerC, while the last six exons code for a Glutatione-S transferase. Coloured boxes indicate identical amino acid residues. B Alignment of Arabidopsis TerC (XP020876262) and the first 14 putative exons in Lus10019895 with 80% amino acid similarity. C Alignment of Arabidopsis GST protein At5g02780 and the last six CDS of Lus10019895, showing 75% amino acid similarity.
Alignment of Lus10019895 protein with GST proteins from other species. Darker shading of residue background indicates a greater number of similar residues at that position. Rectangular boxes indicate non-synonymous changes in amino acid residues between S95407 and CDC Bethune proteins. Dendrogram indicates relatedness of the GST proteins. Lus10019895 from L. usitatissimum has greater similarity to the Arabidopsis lambda GSTs than to AtGST26 (TT19) from Arabidopsis.
Markers and genotypes of S95407 × CDC Bethune RIL population segregating for yellow seed coat colour. The first 94 lines in the population were phenotyped using the SSR markers (Lu69 and Lu442) and KASP markers (KASP5-26). These lines plus an additionaly 94 lines were genotyped using KASP28. Phenotype a = yellow seed coat colour, b = brown seed coat colour. For genotype data h = heterozygote and – = missing data.
Sequences of Lus10019895 for CDC Bethune and S95407.
About this article
Cite this article
Young, L., Akhov, L., Kulkarni, M. et al. Fine-mapping of a putative glutathione S-transferase (GST) gene responsible for yellow seed colour in flax (Linum usitatissimum). BMC Res Notes 15, 72 (2022). https://doi.org/10.1186/s13104-022-05964-x
- Yellow seed
- Glutathione S-transferase