Skip to main content

Maize genomes to fields (G2F): 2014–2017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets

Abstract

Objectives

Advanced tools and resources are needed to efficiently and sustainably produce food for an increasing world population in the context of variable environmental conditions. The maize genomes to fields (G2F) initiative is a multi-institutional initiative effort that seeks to approach this challenge by developing a flexible and distributed infrastructure addressing emerging problems. G2F has generated large-scale phenotypic, genotypic, and environmental datasets using publicly available inbred lines and hybrids evaluated through a network of collaborators that are part of the G2F’s genotype-by-environment (G × E) project. This report covers the public release of datasets for 2014–2017.

Data description

Datasets include inbred genotypic information; phenotypic, climatic, and soil measurements and metadata information for each testing location across years. For a subset of inbreds in 2014 and 2015, yield component phenotypes were quantified by image analysis. Data released are accompanied by README descriptions. For genotypic and phenotypic data, both raw data and a version without outliers are reported. For climatic data, a version calibrated to the nearest airport weather station and a version without outliers are reported. The 2014 and 2015 datasets are updated versions from the previously released files [1] while 2016 and 2017 datasets are newly available to the public.

Objective

Genomes to fields (G2F) is a multi-institutional, public collaborative to develop information and tools that support the translation of maize (Zea mays L.) genomic information into relevant phenotypes for the benefit of growers, consumers, and society. Building on existing maize genome sequence resources, the project focuses on developing approaches to improve phenomic predictability and facilitate the development and deployment of tools and resources that help address fundamental problems of sustainable agricultural productivity. Specific projects within G2F involve collaboration from research fields such as genetics, genomics, plant physiology, agronomy, climatology and crop modeling, computational sciences, statistics, and engineering.

As part of this effort, the G2F G × E project has collected, utilized, and shared multi-year, large-scale genotypic, phenotypic, environmental, and metadata datasets. The datasets described here were generated using standard formats between 2014 and 2017. For each of the testing locations, metadata and soil characterization are also included. During these four growing seasons, over 55,000 plots across 68 unique locations were used to evaluate inbred and hybrid plants. The resulting datasets are unique as they represent, to our knowledge, the most extensive publicly available datasets of their kind in maize, reporting a consistent set of traits across common sets of fully genotyped germplasm across many locations, along with relevant information reported down to the level of specific plots. Making these datasets publicly available is expected to enable researchers to conduct novel data analyses and develop tools using the curated and organized data described here. The 2014 and 2015 datasets are recently updated versions from previously released files (AlKhalifah et al. in BMC Res Notes 11:452, 2018) while 2016 and 2017 datasets are newly available to the public.

Data description

Online forms were developed for logging field site coordinates, field management metadata, and other site-specific information. Datasets include:

  • Genotypic information for inbreds (with and without imputation): This includes single nucleotide polymorphism (SNP) information generated using a genotyping-by-sequence (GBS) method [2] for the inbreds used to produce the hybrids tested across all locations. Data is formatted to be readily analyzed using the TASSEL software [3].

  • Phenotypic measurements for inbreds and hybrids: A handbook of instructions for making traditional phenotypic measurements (reviewed in [4]) is available via the G2F website [5]. Standard traits include stand count, stalk lodging, root lodging, days to anthesis, days to silking, ear height, plant height, plot weight, grain moisture, test weight, and estimated grain yield. Datatypes reported as both raw files and files with outliers removed are described in README files. Additionally, a set of ear, cob, and kernel measurements was made using flatbed scanners and a machine vision platform to quantify components of yield [6]. These data are reported in millimeters with shape descriptors reported as principal components of contour data points. Cob color was reported as RGB (red/green/blue) pixel values. Kernel row number, counted manually, is reported as an integer.

  • Environmental data: Data was collected using WatchDog 2700 weather stations (Spectrum Technologies) measuring at 30-min intervals from planting through harvest at each location. Collected information includes wind speed, direction, and gust; air temperature, dewpoint, and relative humidity; rainfall; and photoperiod. Data are reported based on calibration derived from nearby National Weather Service (NWS) Automated Surface Observing Systems (ASOS) airport weather stations and cleaned by removing obvious artifacts from the calibrated dataset.

  • Soil characterizations: Information was first collected in 2015. Measurements include plow depth, pH, buffered pH, organic matter, texture and nitrogen, phosphorous, potassium, sulfur, and sodium levels (in parts per million).

  • The previously released 2014 and 2015 datasets have been updated through additional quality control of the phenotypic and environmental datasets, the addition of missing site-specific field information and an update of the genotypic data to version 4 of the B73 reference genome.

The 2014–2017 datasets are publicly available via CyVerse/iPlant [7] with files and access links as shown in Table 1.

Table 1 Overview of data file/data set

As the number of collaborators, plots evaluated and research questions across this project grows, it is anticipated that the variety and depth of data collected will also increase. Several projects have utilized aspects of these datasets [13,14,15,16], and more are in preparation. The potential scope of application for these data is broad and is anticipated to impact the field simply by being the first public dataset of its scale that has been collected and reported in a crop sciences using standardized protocols and formats, thus defining standards for data collection, formatting, and access for maize and other species.

Limitations

These datasets contain missing data. In the phenotypic and genotypic datasets, missing data is left blank instead of indicated by ‘null’ or zero to not interfere with software compatibility and interpretation. The only exception is for traits extracted from 2014 and 2015 ear imaging data, which are demarcated with ‘NA’.

For weather datasets, raw files reported by sensors are not provided because machine data were calibrated based on information from nearby weather stations to ensure accuracy (e.g., if the wind vane was set improperly, a calibration correction was required). Instead, only the cleaned version of the file is reported to reduce misinterpretation.

The geographic locations of field locations are not identical across years due to crop rotation management practices. Along with the field location code, the GPS coordinates are reported. While the germplasm used in the experiments is publicly accessible, it was not generated directly by national public genebanks. Seed access and availability are handled by the G2F collaborators directly.

Availability of data materials

The data described in this Data Note can be freely and openly accessed at CyVerse via the following Digital Object Identifiers (DOIs): https://www.doi.org/10.25739/frmv-wj25, https://www.doi.org/10.25739/9wjm-eq41, https://www.doi.org/10.25739/kjsn-dz84, https://www.doi.org/10.25739/yjnh-kt21, https://www.doi.org/10.25739/w560-2114 and https://doi.org/10.7946/P2C34P. See Table 1 and reference list for details and links to the data.

Abbreviations

G2F:

Genomes to fields

G × E:

Genotype-by-environment

GBS:

Genotyping-by-sequencing

RGB:

Red/green/blue

DOI:

Digital Object Identifier

References

  1. AlKhalifah N, et al. Maize genomes to fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets. BMC Res Notes. 2018;11:452.

    Article  Google Scholar 

  2. Elshire RJ, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6(5):e19379.

    Article  CAS  Google Scholar 

  3. Bradbury PJ, et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–5.

    Article  CAS  Google Scholar 

  4. Pauli D, et al. The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol. 2016;172:622–34.

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Genomes to ields. phenotyping handbook https://www.genomes2fields.org/about/project-overview/#standards-and-methods. Accessed 30 Aug 2019.

  6. Miller ND, et al. A robust, high-throughput method for computing maize ear, cob, and kernel attributes automatically from images. Plant J. 2017;89:169–78.

    Article  CAS  Google Scholar 

  7. Merchant N, et al. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016;14:e1002342.

    Article  Google Scholar 

  8. G2F Consortium. G2F planting season 2014. CyVerse Data Commons. 2019. https://doi.org/10.25739/9wjm-eq41.

  9. G2F Consortium. G2F planting season 2015. CyVerse Data Commons. 2019. https://www.doi.org/10.25739/kjsn-dz84.

  10. G2F Consortium. G2F planting season 2016. CyVerse Data Commons. 2019. https://www.doi.org/10.25739/yjnh-kt21.

  11. G2F Consortium. G2F planting season 2017. CyVerse Data Commons. 2019. https://www.doi.org/10.25739/w560-2114.

  12. Spalding E. Genomes to fields inbred ear imaging 2017. CyVerse Data Commons. 2017. https://doi.org/10.7946/p2c34p.

  13. Gage JL, et al. The effect of artificial selection on phenotypic plasticity in maize. Nat Commun. 2017;8:1348.

    Article  Google Scholar 

  14. Lawrence-Dill C, et al. Idea factory: the maize genomes to fields initiative. Crop Sci. 2019;59(4):1406–10.

    Article  Google Scholar 

  15. Anderson SL, et al. Prediction of maize grain yield before maturity using improved temporal height estimates of unmanned aerial systems. Plant Phenome J. 2019;2:190004.

    Article  Google Scholar 

  16. Falcon CM, Kaeppler SM, Spalding EP, et al. Relative utility of agronomic, phenological, and morphological traits for assessing genotype-by-environment interaction in maize inbreds. Crop Sci. 2020. https://doi.org/10.1002/csc2.20035.

    Article  Google Scholar 

Download references

Acknowledgements

We gratefully acknowledge the data management training and transition contributions from Darwin A. Campbell, Jack M. Gardiner, Carolyn Lawrence-Dill and Renee Walton. We also acknowledge contributions from many field managers and data collectors including: Lisa Coffey (P. Schnable lab); Dustin Eilert, Marina Borsecnik, Rachel Perry, Emily Rothfusz, and Jane Petzoldt (de Leon/Kaeppler labs); Nick Lepak, Josh Budka, Nicholas Kaczmar, and Judy Kolkman (Cornell University); Miriam Lopez, Grace Kuehne, and Sarah Weirich (Lauter lab); Teclemariam Weldekidan (Wisser lab); Christine Smith (J. Schnable lab); Jacob Garfin, Amanda Gilbert and Thomas Hoverstad (Hirsch lab); Pete Hermanson (Springer lab); Nicole Yana (Bohn lab); Jacob Pekar (Texas A&M University); Susan Melia-Hancock (USDA-ARS, Columbia, MO); and Bill Widdicombe (Michigan State University). We also benefitted from data management discussions with Nicole Hopkins and Jeremy DeBarry (formerly with CyVerse); Kate Dreher, Clarissa Pimental, Julian Pietragalla, Jean-Marcel Ribaut, and Sarah Hearne (CIMMYT); Jan Erik Backlund and Kelly Robbins (Cornell University); and Matthew Berrigan (LeafNode).

Funding

We gratefully acknowledge support from: USDA Hatch program funds to multiple PIs in this project; the USDA Agricultural Research Service; the Arkansas Corn and Grain Sorghum Board; the Clemson University, the Colorado Corn Administrative Committee; the Georgia Agricultural Commodity Commission for Corn; the Corn Marketing Program of Michigan; the Illinois Corn Marketing Board; the Iowa Corn Promotion Board; the Iowa State University Plant Sciences Institute; the Kansas Corn Commission; the Minnesota Corn Research and Promotion Council; National Corn Growers Association; Nebraska Corn Board; the Ohio Corn Marketing Program; the Ontario Ministry of Agriculture, Food, and Rural Affairs; the Texas Corn Producers Board and the Wisconsin Corn Promotion Board. We also acknowledge funding from the National Science Foundation under Grant Numbers #DBI-0735191 and #DBI-1265383 to support CyVerse (http://www.cyverse.org), #IOS-1339362 to support phenotyping by JC and SPM, and USDA-NIFA 2011-67003-30342 to RJW, SFG, JH, NL, SM, WX, and NDL. The funders had no role in the design and conduct of the study, data collection, and writing of the manuscript.

Author information

Authors and Affiliations

Authors

Contributions

BAM, NAK, JE, CMF, JLG, DJ, DCL, NDM, CP, MCR, KS, RW, CTY: data management team; MB, JB, ESB, IC, JE, SFG, MAG, CG, CH, JBH, EH, DH, SMK, JK, GK, NL, ECL, AL, JPL, JM, SPM, SCM, RN, TR, OR, JCS, BS, RS, MS, MS, EPS, NS, KT, PT, MT, JW, DW, RJW, WX, NdL: data contributors; DE, PSS, NdL: communication. The data management team aggregated, curated, and made available data resources. Contributors advised on data collection methods, collected the data, and reviewed data collection and curation methods as well as datasets. Communicating authors wrote the manuscript and guided data collection, curation, and distribution. All authors reviewed the manuscript. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Natalia de Leon.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for Publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

McFarland, B.A., AlKhalifah, N., Bohn, M. et al. Maize genomes to fields (G2F): 2014–2017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets. BMC Res Notes 13, 71 (2020). https://doi.org/10.1186/s13104-020-4922-8

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13104-020-4922-8

Keywords