Skip to main content

Gene function annotations for the maize NAM founder lines

Abstract

Objectives

We annotated the latest published sequences of the 26 Zea mays Nested Association Mapping (NAM) founder lines using GOMAP, the Gene Ontology Meta Annotator for Plants. The maize NAM panel enables researchers to understand and identify the genetic basis of complex traits. Annotations of predicted functions for genes can help researchers investigate gene-phenotype associations, prioritize candidate genes for phenotypes of interest, and formulate testable hypotheses about gene function/phenotype associations. The creation and release of high-confidence, high-coverage gene function annotation sets for the NAM founder lines is critical to accelerate the generation of knowledge in maize genetics research. GOMAP is a high-throughput computational pipeline that annotates gene functions genome-wide in plant genomes using Gene Ontology functional class terms. Here we report and share GOMAP-generated functional annotations for the NAM founder lines.

Data description

Datasets include the protein sequences used as input, GOMAP-generated annotation files, scripts used to update obsolete terms, and GAF-formatted tab-delimited text files of gene function annotations along with README files that describe formatting, content, and how files relate to each other.

Peer Review reports

Objective

GOMAP is an annotation tool that generates high-coverage, high-quality (based on F-measure), whole-genome functional annotations for plants. It assigns genes with Gene Ontology (GO) terms through sequence similarity, domain presence, and mixed method pipelines [1]. The GO framework includes a standardized vocabulary designed to describe gene functions under three categories: biological process, molecular function, and cellular component [2]. Using GOMAP, we annotated the 26 Zea mays ssp. mays Nested Association Mapping (NAM) founder lines [3]. The maize NAM population was established to enhance the genetic diversity of maize to determine the genetic structure of complex traits by merging the benefits of quantitative trait locus and association mapping studies and reducing their limitations [4].

The availability of the GOMAP-generated Zea mays ssp. mays NAM founder lines annotation datasets can be of great use to scientists in the plant community, especially those with research focused on maize. GO-based function predictions can allow researchers to identify novel candidate genes for hypotheses generation and testing of gene functions. Moreover, the annotation datasets can also be used for gene-phenotype association analyses, identification of novel genes in a pathway of interest, and investigation of different functions within subpopulations of maize, to name a few. We expect new gene function findings and experimental validations from our cleaned datasets described here.

Data description

A standardized functional annotation dataset is available for each of the NAM founder lines (Table 1). Datasets include:

  • Protein sequences of the maize lines that were used as input for GOMAP. We have included the original protein sequences, the Python script we used to reformat the original sequences to produce the GOMAP input file, and the GOMAP input file. A README file is provided for further description of the data and includes where the original file was downloaded from, and how to run the python file. Reformatting was required for proper text wrapping, removal of any asterisks in the sequences, and the selection of the longest transcript of each gene.

  • The raw output gene annotation file produced by GOMAP. This file is the aggregated functional annotation generated by the pipeline and follows the GO Annotation File 2 (GAF 2) format.

  • Python scripts and supplementary resources to modify and clean the GOMAP-output file. Modifications are done to the gene and transcript names by adding a transcript identifier column. Cleanup includes the removal of any obsolete GO terms and the removal of duplicates. Descriptions of these files and details on how to run the scripts are provided in an accompanying README file. For consistency, the go.obo file used on all our maize datasets reported here is of release 2022-07-01, the same as that incorporated in GOMAP v1.3.9.

  • The final cleaned functional annotation dataset. This is the GAF 2 file that is generated using the 2.3_cleanup.py script. These GO-based gene function predictions can be readily used by the public.

Table 1 Overview of data files/data sets

Each directory has its own README that provides more information about the files. There also is a top-level overall README that describes the dataset more generally. Moreover, each dataset has its own standardized metadata. The datasets are publicly available on CyVerse [5] and can be accessed using the links provided in Table 1.

The structure of our dataset is an attempt to ensure data reproducibility and abidance to the data principles of findability, accessibility, interoperability, and reusability (FAIR) [6]. The overall organization of our annotation datasets is not new; we developed and applied this form for previously studied GOMAP-generated annotation files [7]. A full list of our annotated plant genomes can be found here [8], and includes annotation sets for 24 species, including sorghum, rice, wheat, barley, cotton, and hemp. For users interested in generating their own GO-based annotations, the GOMAP pipeline itself is available for general use [1], and a description of how to use the pipeline is also available [9].

We have generated and publicly released new functional annotations using the most up-to-date version of GOMAP (v1.3.9) on our old datasets, including the previously annotated maize lines Mo17 [36,37,38], W22 [39, 40], and PH207 [41, 42]. As an example, the annotations for Zea mays B73v5 reported here is an update of a previously released dataset [43]. We anticipate that the availability and maintenance of our datasets will benefit researchers in providing plant gene function predictions, paving the way for the generation of testable hypotheses on novel candidate genes of interest.

Limitations

The quality of the annotations is dependent on the quality of the input sequences. Genomes with high quality sequencing and coverage are expected to have better annotations. However, genomes with lower quality sequencing will result in limitations in downstream analyses.

In the case of the presence of multiple transcripts per gene IDs, GOMAP requires the selection of the longest transcript for each gene ID because the pipeline contains a reciprocal best hit step. This means that not every transcript ID per gene ID is going to be annotated in the resulting file. We have included a transcript ID column in the final cleaned file that allows the user to identify which one was included in the reformatted input file and annotated through GOMAP.

A cleanup step is performed on our GOMAP output file to remove any obsolete GO terms. This step relies on using a go.obo file. For the datasets reported here, we have used the go.obo file released in 2022-07-01. A user may replace this with the most current version of the go.obo file currently available for their own output.

While using the functional annotation datasets, it is worth noting that the GO Directed Acyclic Graph (DAG) lacks a good portrayal of plant functions underrepresented in the model species, Arabidopsis thaliana. This could lead to instances where the assignment of unconventional functions is due to the absence of related plant functions [7].

Availability of data and materials

The data described in this Data note can be freely and openly accessed on CyVerse under the DOIs listed in Table 1. Please see Table 1 and the reference list for details and links to the data.

Abbreviations

GOMAP:

Gene Ontology Meta Annotator for Plants

NAM:

Nested Association Mapping

GO:

Gene Ontology

DAG:

Directed Acyclic Graph

FAIR:

Findability, accessibility, interoperability, and reusability

References

  1. Wimalanathan K, Lawrence-Dill CJ. Gene ontology meta annotator for plants (GOMAP). Plant Methods. 2021;17(1):1–4.

    Article  Google Scholar 

  2. Thomas PD. The gene ontology and the meaning of biological function. The gene ontology handbook. Springer; 2017. p. 15–24.

    Book  Google Scholar 

  3. Hufford MB, Seetharam AS, Woodhouse MR, Chougule KM, Ou S, Liu J, Ricci WA, Guo T, Olson A, Qiu Y, Della CR. De novo assembly, annotation, and comparative analysis of 26 diverse maize genomes. Science. 2021;373(6555):655–62.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Yu J, Holland JB, McMullen MD, Buckler ES. Genetic design and statistical power of nested association mapping in maize. Genetics. 2008;178(1):539–51.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Goff SA, Vaughn M, McKay S, Lyons E, Stapleton AE, Gessler D, Matasci N, Wang L, Hanlon M, Lenards A, Muir A. The iPlant collaborative: cyberinfrastructure for plant biology. Front Plant Sci. 2011;2:34.

    Article  PubMed  PubMed Central  Google Scholar 

  6. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, Blomberg N, Boiten JW, da Silva Santos LB, Bourne PE, Bouwman J. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data. 2016;3(1):1–9.

    Article  Google Scholar 

  7. Fattel L, Psaroudakis D, Yanarella CF, Chiteri KO, Dostalik HA, Joshi P, Starr DC, Vu H, Wimalanathan K, Lawrence-Dill CJ. Standardized genome-wide function prediction enables comparative functional genomics: a new application area for Gene Ontologies in plants. GigaScience. 2022;11:giac023.

    Article  PubMed  PubMed Central  Google Scholar 

  8. Publicly available GOMAP Datasets; 2023. https://faculty.sites.iastate.edu/triffid/gomap

  9. Wimalanathan K, Lawrence-Dill CJ. Dill-PICL/GOMAP-singularity. GitHub; 2023. https://github.com/Dill-PICL/GOMAP-singularity

  10. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_B73_NAM_5.0_October_2022_v2.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/cfvb-jn16

  11. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_B97_NAM_1.0_October_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/abf6-pa81

  12. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML52_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/qgb3-8743

  13. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML69_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/xvga-0f52

  14. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML103_NAM_1.0_October_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/1n89-rd43

  15. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML228_NAM_1.0_October_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/e6hc-0406

  16. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML247_NAM_1.0_October_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/jnwv-g571

  17. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML277_NAM_1.0_October_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/ggj0-by23

  18. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML322_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/36bb-f096

  19. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_CML333_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/tnhe-yr36

  20. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_HP301_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/2jhr-hy41

  21. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Il14H_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/t500-af32

  22. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Ki3_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/y2t8-zp24

  23. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Ki11_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/thx1-dm44

  24. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Ky21_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/ay3t-b914

  25. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_M37W_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/cgmt-s267

  26. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_M162W_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/pewv-k336

  27. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Mo18W_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/w0zf-jc74

  28. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Ms71_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/9gb5-aq74

  29. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_NC350_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/q46m-qy91

  30. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_NC358_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/0w9q-ta36

  31. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Oh7B_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/910q-f303

  32. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Oh43_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/8a63-3n35

  33. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_P39_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/dgda-md18

  34. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Tx303_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/gz5q-rw97

  35. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Tzi8_NAM_1.0_November_2022.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/9g8d-ny61

  36. Lawrence-Dill C. GOMAP Maize Zm-Mo17-REFERENCE-CAU-1.0 Zm00014a.1. CyVerse Data Commons; 2019. https://doi.org/10.25739/m634-cn58

  37. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_Mo17_CAU_1.0_May_2023_v2.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/zjmm-vf13

  38. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_CyVerse_Mo17_CAU_2.0_July_2023.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/tr4x-ta89

  39. Lawrence-Dill C. GOMAP Maize Zm-W22-REFERENCE-NRGENE-2.0 Zm00004b.1. CyVerse Data Commons; 2019. https://doi.org/10.25739/e4va-9f09

  40. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_W22_NRGENE_2.0_May_2023_v2.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/x1fn-w309

  41. Lawrence-Dill C. GOMAP Maize Zm-PH207-REFERENCE_NS-UIUC_UMN-1.0 Zm00008a.1. CyVerse Data Commons; 2019. https://doi.org/10.25739/dm9s-aa15

  42. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_PH207_NS-UIUC_UMN_1.0_May_2023_v2.r1. CyVerse Data Commons; 2023. https://doi.org/10.25739/e047-e733

  43. Lawrence-Dill C. Carolyn_Lawrence_Dill_GOMAP_Maize_MaizeGDB_B73_NAM_5.0_December_2021.r1. CyVerse Data Commons; 2022. https://doi.org/10.25739/g1rt-b278

Download references

Acknowledgements

We thank CyVerse for providing a collaborative cyberinfrastructure to share data with the research community. We also thank Iowa State University High Performance Computing facility for the equipment and resources that accelerate our research. Finally, we thank the researchers who sequenced and assembled the plant datasets that were used as input in our research.

Funding

We gratefully acknowledge support from: NSF and USDA for AIIRA 2021-67021-35329; IOW0417 Hatch Funding to Iowa State University; Iowa State Predictive Plant Phenomics NSF Research Traineeship (DGE-1545453; CJLD is a co-principal investigator, and CFY is a trainee).

Author information

Authors and Affiliations

Authors

Contributions

LF generated and organized the maize NAM founder maize lines datasets. BN generated the updated datasets of maize lines Mo17, PH207, and W22. OTJ generated the original maize B73v5 dataset. LF created the metadata for each dataset and requested DOIs. CFY established the dataset structure to be applied to all our GOMAP-generated datasets. DAC supervised the release of datasets and creation of DOIs through CyVerse. KW created the GOMAP system. LF and CJLD wrote the manuscript. All authors read, suggested improvements, and approved the final copy of the manuscript.

Corresponding author

Correspondence to Carolyn J. Lawrence-Dill.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Fattel, L., Yanarella, C.F., Ngara, B. et al. Gene function annotations for the maize NAM founder lines. BMC Res Notes 17, 9 (2024). https://doi.org/10.1186/s13104-023-06668-6

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13104-023-06668-6

Keywords