DNA barcode trnH-psbA is a promising candidate for efficient identification of forage legumes and grasses
BMC Research Notes volume 13, Article number: 35 (2020)
Grasslands are widespread ecosystems that fulfil many functions. Plant species richness (PSR) is known to have beneficial effects on such functions and monitoring PSR is crucial for tracking the effects of land use and agricultural management on these ecosystems. Unfortunately, traditional morphology-based methods are labor-intensive and cannot be adapted for high-throughput assessments. DNA barcoding could aid increasing the throughput of PSR assessments in grasslands. In this proof-of-concept work, we aimed at determining which of three plant DNA barcodes (rbcLa, matK and trnH-psbA) best discriminates 16 key grass and legume species common in temperate sub-alpine grasslands.
Barcode trnH-psbA had a 100% correct assignment rate (CAR) in the five analyzed legumes, followed by rbcLa (93.3%) and matK (55.6%). Barcode trnH-psbA had a 100% CAR in the grasses Cynosurus cristatus, Dactylis glomerata and Trisetum flavescens. However, the closely related Festuca, Lolium and Poa species were not always correctly identified, which led to an overall CAR in grasses of 66.7%, 50.0% and 46.4% for trnH-psbA, matK and rbcLa, respectively. Barcode trnH-psbA is thus the most promising candidate for PSR assessments in permanent grasslands and could greatly support plant biodiversity monitoring on a larger scale.
Grasslands are some of the most widespread ecosystems on Earth, covering two-fifth of its land surface . They provide roughage for ruminant livestock production and many other environmental services related to carbon sequestration, water flow regulation and soil stabilization [2, 3]. Plant species richness (PSR) is a component of biodiversity with major effects on the ecosystem functioning of grasslands. In experimental grassland plant communities, high levels of PSR stabilize yields and confer tolerance against environmental stressors . Similar effects have been observed in semi-natural grasslands, which are composed of a limited number of species and are an important component of sustainable livestock production . Assessing PSR is thus crucial for tracking its changes and effects on ecosystem services. However, such assessments have traditionally relied on morphology-based surveys that are labor-intensive and require trained taxonomists, limiting their use for surveying PSR over large scales and long time periods . Furthermore, grasses and legumes (the two plant families of major economic relevance in temperate grasslands) can be taxonomically assessed with highest precision only when certain distinctive morphological characters are on display (e.g., flowering bodies and leaves). Still, some grass and legume species are difficult to distinguish from closely related species. A standardized, precise, high-throughput solution for PSR surveys in grasslands is therefore desirable for large-scale assessments of changes in PSR.
DNA barcoding is a methodology that has been successfully applied for standardizing and increasing the throughput of PSR surveys in ecological studies [6, 7]. DNA barcodes are organellar or nuclear loci that show a high degree of species-level conservation [8, 9]. By comparing newly sequenced DNA barcodes to reference databases, it is possible to assign an unknown biological sample to its correct taxonomy. An international effort is currently in place to maintain a well-curated, public reference database of DNA barcodes (The Barcode Of Life Datasystems database, BOLD ).
In animals, the DNA barcode of choice is the mitochondrial COI gene, which can reproducibly differentiate most of the major animal phyla . In plants, in contrast, there is no single DNA barcode with comparable success . Most plant DNA barcodes are located in the chloroplast genome, either within coding sequences (such as rbcLa and matK) or in intergenic regions (such as trnH-psbA) [11, 12], although some nuclear loci have also been used as DNA barcodes, e.g., the internal transcribed spacer of the ribosomal DNA (ITS) . More than one barcode per plant individual are typically sequenced and used for taxonomical assignments [11, 12]. However, sequencing more than one DNA barcode per plant may not be technically feasible in higher throughput settings, particularly when analyzing mixed-species samples.
The aim of the present study was to determine the best DNA barcode sequences for forage species by screening the BOLD database for promising candidates and sequencing three DNA barcodes (rbcLa, matK and trnH-psbA) from multiple cultivars of 16 forage plant species that are common in sub-alpine grasslands.
Plant material and DNA extraction
Seeds of 2–3 cultivars of 16 forage species (Alopecurus pratensis L., Arrhenaterum elatius L., Cynosurus cristatus L., Dactylis glomerata L., Festuca pratensis Huds., F. rubra L., Lolium perenne L., L. multiflorum Lam., Lotus corniculatus L., Medicago sativa L., Phleum pratense L., Poa pratensis L., Trifolium pratense L., T. repens L. and Trisetum flavescens L.), kindly provided by Agroscope, Zurich, Switzerland were used for the study (Table 1). Seeds were germinated and transferred into pot trays (77 wells, 50 cm × 32 cm, with compost as substrate). The species selected are predominant components of sub-alpine grasslands and hold great potential for multifunctional, species-rich agriculture [14, 15]. Plants were grown for 3 weeks after which DNA was extracted from three plants per species. For grasses, three leaf fragments of ~ 1 cm and for legumes three young leaflets were harvested. The plant material was freeze-dried for 48 h and pulverized in a QIAGEN TissueLyser II (QIAGEN, Hilden, Germany). DNA was extracted using the NucleoSpin® II kit (Macherey–Nagel, Düren, Germany) and its integrity visually inspected by agarose gel electrophoresis (1% w/v). DNA purity and concentration were determined with a NanoDrop™ spectrophotometer (ThermoFisher Scientific, Waltham, MA, USA).
DNA barcode amplification and sequencing
The BOLD database was screened for DNA barcode sequences of the selected species and close relatives; barcodes rbcLa, matK and trnH-psbA were selected as candidates because they reported the most available sequences. Those DNA barcodes are mainly located in the chloroplast genome and are not known to have paralogs that can interfere with taxonomic assignments, as is the case for some nuclear loci such as ITS . Primer sequences for the three barcodes were obtained from BOLD  and were optimized for amplification in the target plant families (Additional file 1: Table S1). Each PCR reaction consisted of 15 ng of template DNA, 1× flexi buffer (Promega, Madison, WI, USA), 2 mM MgCl2, 200 µM dNTPs, each primer at 0.4 µM, 0.75 units of GoTaq® G2 Flexi DNA Polymerase (Promega, Madison, WI, USA) and water to a final volume of 30 µL.
For rbcLa, PCR conditions were 5 min at 94 °C followed by 33 cycles of 40 s at 94 °C, 1 min at 55 °C and 40 s at 72 °C, followed by a final extension cycle of 10 min at 72 °C. For matK and trnH-psbA, a 5 min at 94 °C followed by 50 cycles of 40 s at 94 °C, 1 min at 54 °C and 40 s at 72 °C followed by a final extension cycle of 10 min at 72 °C were used. The integrity of the amplicons was visually inspected by agarose gel electrophoresis (1% w/v).
Amplicons were purified in a MultiScreen PCR96 filter plate (Merck, Darmstadt, Germany). Sequencing reactions were prepared with 1× BigDye™ Terminator 3.1 Reaction Mix (ThermoFisher Scientific, Waltham, MA, USA), 1× BigDye™ 3.1 Sequencing Buffer, forward or reverse primer at 0.16 µM and 800 ng of purified amplicon to a final volume of 5 µL. The same primers used for PCR were used for sequencing. Capillary electrophoresis was performed on a 3130 ABI (ThermoFisher Scientific, Waltham, MA, USA). The resulting traces were quality filtered and merged using GAP4  with the default settings. All traces and sequences were uploaded to BOLD v4 (project code: SWFRG; http://www.boldsystems.org/index.php/Public_SearchTerms).
Sequences of matK, rbcLa and trnH-psbA were downloaded from BOLD v4 on May 23, 2019 . Only sequences from the Poaceae and Fabaceae families with no contaminants and longer than 200 bp were included. In total, 6232 rbcLa, 11,971 matK and 1236 trnH-psbA sequences were present in the downloaded fasta files, which also include the plants from the BOLD project SWFRG (Additional file 1: Table S2). The taxonomical identifiers of the BOLD fasta files were reformatted to remove spaces and rearrange their informative fields in a consistent manner (fasta_name_reformat.py script from https://github.com/mloera/forage-barcoding).
Each barcode-specific fasta file was then used to make a blast database and the SWFRG sequences were queried in their corresponding database with blastn using the flag outfmt = 6 (i.e., tabular format). The resulting blast output tables were parsed with the blastn_matcher.R script from the above-mentioned GitHub repository. The script removes self-hits and corrects some misspellings in the taxonomy of queries and hits. The script then compares the taxonomy of the queries and hits at the species- and genus-level. A “match” was called when the taxonomy of a query sequence is equal to the taxonomy of the highest scoring hit or hits (Additional file 1: Table S3). A “taxonomical assignment rate” for each barcode was then calculated as the ratio between the sum of its correct taxonomical assignments and the total number of query sequences.
Results and discussion
PCR and sequencing results
The primer sequences of trnH-psbA and matK were adapted to allow for amplification within the target species, while the primer sequences of rbcLa did not need any modification (Additional file 1: Table S1). From the 48 processed specimens, 130 sequences were obtained (46 for matK, 43 for rbcLa and 41 for trnH-psbA-) after repeating and optimizing failed amplifications. The size of the sequences ranged from 470 to 588 bp for rbcLa, 185 to 888 bp for matK and 268 to 614 bp for trnH-psbA (Table 1).
Barcode trnH-psbA had a 100% correct assignment rate (CAR) in legumes, followed by rbcLa (93.3%) matK (57.1%; Table 2). The highest CAR for grasses was 65.4% with trnH-psbA, followed by matK (48.4%) and rbcLa (46.4%). Overall, genus-level CARs were 69.8%, 73.3% and 90.2% for rbcLa, matK and trnH-psbA, respectively. Legumes had also the highest assignment rate on the genus level (100% correct assignments for all barcodes; Table 2), while correct assignments for grass genera were 53.6%, 61.3% and 84.6% for barcodes rbcLa, matK and trnH-psbA, respectively.
The low CARs for grass DNA barcodes could be due to various factors. Some grass species, such as Poa spp., are notoriously hard to discriminate morphologically and their phylogeny is subject to controversy [17, 18]. This could have resulted in misidentified reference sequences. Another factor is the high genetic similarity between some grass taxa. For example, the genetic similarity of some species of the Festuca-Lolium complex is reported to be > 90%, as calculated from transcriptomic data of orthologous genes . This may result in a higher proportion of incorrect taxonomic assignments for such grass species .
Barcode trnH-psbA makes for a good candidate for large-scale DNA barcoding of forage legumes and some grasses, such as C. cristatus, D. glomerata and T. flavescens (Table 3). However, further work is needed to produce reference sequences in more forage species and cultivars. Overall, our results provide the basic tools to implement DNA barcoding in forage species (i.e., family-specific primer pairs and a standard bioinformatic workflow for taxonomic assignments) and can help in choosing an appropriate DNA barcode for high-throughput applications. Such high-throughput applications could greatly enhance the biodiversity-monitoring protocols that are used to study the ecology of grasslands, its dynamics and its interplay with agriculture.
This is exploratory work focused on the most common forage plant species from sub-alpine temperate grasslands; further work is needed to address other forage species from different kinds of grasslands.
As a proof of concept, three specimens per species were analyzed.
Availability of data and materials
The datasets generated and/or analysed during the current study are available in the following GitHub repository: https://github.com/mloera/forage-barcoding. Sequencing trace files are found in BOLD (http://www.boldsystems.org/index.php/Public_SearchTerms) using the search term “SWFRG” and are also available on https://doi.org/10.5281/zenodo.3597069.
Barcode of Life Datasystems
correct assignment rate
plant species richness
Reynolds SG. Chapter 1 Introduction. In: Suttie JM, Reynolds SG, Batello C, editors. Grasslands of the World. Food and Agriculture Organization of the United Nations; 2005. http://www.fao.org/3/y8344e00.htm.
Lindborg R, Bengtsson J, Berg Å, Cousins SAO, Eriksson O, Gustafsson T, et al. A landscape perspective on conservation of semi-natural grasslands. Agric Ecosyst Environ. 2008;125:213–22. https://doi.org/10.1016/j.agee.2008.01.006.
Bengtsson J, Bullock JM, Egoh B, Everson C, Everson T, O’Connor T, et al. Grasslands-more important for ecosystem services than you might think. Ecosphere. 2019;10:e02582. https://doi.org/10.1002/ecs2.2582.
Wang Y, Cadotte MW, Chen Y, Fraser LH, Zhang Y, Huang F, et al. Global evidence of positive biodiversity effects on spatial ecosystem stability in natural grasslands. Nat Commun. 2019;10:3207. https://doi.org/10.1038/s41467-019-11191-z.
Feßel C, Meier IC, Leuschner C. Relationship between species diversity, biomass and light transmittance in temperate semi-natural grasslands: is productivity enhanced by complementary light capture? J Veg Sci. 2016;27:144–55.
Kress WJ. Plant DNA barcodes: applications today and in the future. J Syst Evol. 2017;55:291–307.
Kress WJ, García-Robledo C, Uriarte M, Erickson DL. DNA barcodes for ecology, evolution, and conservation. Trends Ecol Evol. 2015;30:25–35. https://doi.org/10.1016/j.tree.2014.10.008.
Hebert PDN, Ratnasingham S, de Waard JR. Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc R Soc London Ser B Biol Sci. 2003. https://doi.org/10.1098/rsbl.2003.0025.
Hebert PDN, Cywinska A, Ball SL, DeWaard JR. Biological identifications through DNA barcodes. Proc R Soc London Ser B Biol Sci. 2003;270:313–21. https://doi.org/10.1098/rspb.2002.2218.
Ratnasingham S, Hebert PDN. BARCODING: bold: the barcode of life data system (http://www.barcodinglife.org). Mol Ecol Notes. 2007;7:355–64. doi:10.1111/j.1471-8286.2007.01678.x
CBOL Plant Working Group. A DNA barcode for land plants. Proc Natl Acad Sci USA. 2009;106:12794–7. https://doi.org/10.1073/pnas.0905845106.
Kress WJ, Erickson DL. A two-locus global DNA barcode for land plants: the coding rbcL gene complements the non-coding trnH-psbA spacer region. PLoS ONE. 2007;2:e508. https://doi.org/10.1371/journal.pone.0000508.
Bolson M, Smidt E, de C, Brotto ML, Silva-Pereira V. ITS and trnH-psbA as efficient DNA Barcodes to identify threatened commercial woody angiosperms from southern Brazilian Atlantic rainforests. PLOS ONE. 2015;10:e0143049. https://doi.org/10.1371/journal.pone.0143049.
French KE. Species composition determines forage quality and medicinal value of high diversity grasslands in lowland England. Agric Ecosyst Environ. 2017;241:193–204. https://doi.org/10.1016/j.agee.2017.03.012.
Lüscher A, Mueller-Harvey I, Soussana JF, Rees RM, Peyraud JL. Potential of legume-based grassland-livestock systems in Europe: a review. Grass Forage Sci. 2014;69:206–28.
Staden R, Judge DP, Bonfield JK. Managing sequencing projects in the GAP4 environment. In: Krawetz SA, Womble DD, editors. Introduction to bioinformatics: a theoretical and practical approach. Totowa: Humana Press; 2003. p. 327–44. https://doi.org/10.1007/978-1-59259-335-4_20.
Nosov NN, Punina EO, Machs EM, Rodionov AV. Interspecies hybridization in the origin of plant species: cases in the genus Poa sensu lato. Biol Bull Rev. 2015;5:366–82. https://doi.org/10.1134/S2079086415040088.
Patterson JT, Larson SR, Johnson PG. Genome relationships in polyploid Poa pratensis and other Poa species inferred from phylogenetic analysis of nuclear and chloroplast DNA sequences. Genome. 2005;48:76–87. https://doi.org/10.1139/g04-102.
Czaban A, Sharma S, Byrne SL, Spannagl M, Mayer KFX, Asp T. Comparative transcriptome analysis within the Lolium/Festuca species complex reveals high sequence conservation. BMC Genomics. 2015;16:249. https://doi.org/10.1186/s12864-015-1447-y.
Meyer CP, Paulay G. DNA barcoding: error rates based on comprehensive sampling. PLoS Biol. 2005;3:e422. https://doi.org/10.1371/journal.pbio.0030422.
Data produced and analyzed in this paper were generated in collaboration with F. Widmer (Molecular Ecology, Agroscope, Zurich, Switzerland) and the Genetic Diversity Centre (ETH Zurich, Switzerland). Seeds were kindly provided by H. Hirschi (Agroscope, Zurich, Switzerland). We would also thank M. Hardegger and C. Kaegi (Swiss Federal Office of Agriculture (FOAG)) for valuable advice regarding the design of the study.
This work was funded by the Swiss Federal Office of Agriculture (FOAG). The funding body assisted in conceiving the study and in designing the experiments.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Loera-Sánchez, M., Studer, B. & Kölliker, R. DNA barcode trnH-psbA is a promising candidate for efficient identification of forage legumes and grasses. BMC Res Notes 13, 35 (2020). https://doi.org/10.1186/s13104-020-4897-5