Maize genomes to fields (G2F): 2014–2017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets
BMC Research Notes volume 13, Article number: 71 (2020)
Advanced tools and resources are needed to efficiently and sustainably produce food for an increasing world population in the context of variable environmental conditions. The maize genomes to fields (G2F) initiative is a multi-institutional initiative effort that seeks to approach this challenge by developing a flexible and distributed infrastructure addressing emerging problems. G2F has generated large-scale phenotypic, genotypic, and environmental datasets using publicly available inbred lines and hybrids evaluated through a network of collaborators that are part of the G2F’s genotype-by-environment (G × E) project. This report covers the public release of datasets for 2014–2017.
Datasets include inbred genotypic information; phenotypic, climatic, and soil measurements and metadata information for each testing location across years. For a subset of inbreds in 2014 and 2015, yield component phenotypes were quantified by image analysis. Data released are accompanied by README descriptions. For genotypic and phenotypic data, both raw data and a version without outliers are reported. For climatic data, a version calibrated to the nearest airport weather station and a version without outliers are reported. The 2014 and 2015 datasets are updated versions from the previously released files  while 2016 and 2017 datasets are newly available to the public.
Genomes to fields (G2F) is a multi-institutional, public collaborative to develop information and tools that support the translation of maize (Zea mays L.) genomic information into relevant phenotypes for the benefit of growers, consumers, and society. Building on existing maize genome sequence resources, the project focuses on developing approaches to improve phenomic predictability and facilitate the development and deployment of tools and resources that help address fundamental problems of sustainable agricultural productivity. Specific projects within G2F involve collaboration from research fields such as genetics, genomics, plant physiology, agronomy, climatology and crop modeling, computational sciences, statistics, and engineering.
As part of this effort, the G2F G × E project has collected, utilized, and shared multi-year, large-scale genotypic, phenotypic, environmental, and metadata datasets. The datasets described here were generated using standard formats between 2014 and 2017. For each of the testing locations, metadata and soil characterization are also included. During these four growing seasons, over 55,000 plots across 68 unique locations were used to evaluate inbred and hybrid plants. The resulting datasets are unique as they represent, to our knowledge, the most extensive publicly available datasets of their kind in maize, reporting a consistent set of traits across common sets of fully genotyped germplasm across many locations, along with relevant information reported down to the level of specific plots. Making these datasets publicly available is expected to enable researchers to conduct novel data analyses and develop tools using the curated and organized data described here. The 2014 and 2015 datasets are recently updated versions from previously released files (AlKhalifah et al. in BMC Res Notes 11:452, 2018) while 2016 and 2017 datasets are newly available to the public.
Online forms were developed for logging field site coordinates, field management metadata, and other site-specific information. Datasets include:
Genotypic information for inbreds (with and without imputation): This includes single nucleotide polymorphism (SNP) information generated using a genotyping-by-sequence (GBS) method  for the inbreds used to produce the hybrids tested across all locations. Data is formatted to be readily analyzed using the TASSEL software .
Phenotypic measurements for inbreds and hybrids: A handbook of instructions for making traditional phenotypic measurements (reviewed in ) is available via the G2F website . Standard traits include stand count, stalk lodging, root lodging, days to anthesis, days to silking, ear height, plant height, plot weight, grain moisture, test weight, and estimated grain yield. Datatypes reported as both raw files and files with outliers removed are described in README files. Additionally, a set of ear, cob, and kernel measurements was made using flatbed scanners and a machine vision platform to quantify components of yield . These data are reported in millimeters with shape descriptors reported as principal components of contour data points. Cob color was reported as RGB (red/green/blue) pixel values. Kernel row number, counted manually, is reported as an integer.
Environmental data: Data was collected using WatchDog 2700 weather stations (Spectrum Technologies) measuring at 30-min intervals from planting through harvest at each location. Collected information includes wind speed, direction, and gust; air temperature, dewpoint, and relative humidity; rainfall; and photoperiod. Data are reported based on calibration derived from nearby National Weather Service (NWS) Automated Surface Observing Systems (ASOS) airport weather stations and cleaned by removing obvious artifacts from the calibrated dataset.
Soil characterizations: Information was first collected in 2015. Measurements include plow depth, pH, buffered pH, organic matter, texture and nitrogen, phosphorous, potassium, sulfur, and sodium levels (in parts per million).
The previously released 2014 and 2015 datasets have been updated through additional quality control of the phenotypic and environmental datasets, the addition of missing site-specific field information and an update of the genotypic data to version 4 of the B73 reference genome.
As the number of collaborators, plots evaluated and research questions across this project grows, it is anticipated that the variety and depth of data collected will also increase. Several projects have utilized aspects of these datasets [13,14,15,16], and more are in preparation. The potential scope of application for these data is broad and is anticipated to impact the field simply by being the first public dataset of its scale that has been collected and reported in a crop sciences using standardized protocols and formats, thus defining standards for data collection, formatting, and access for maize and other species.
These datasets contain missing data. In the phenotypic and genotypic datasets, missing data is left blank instead of indicated by ‘null’ or zero to not interfere with software compatibility and interpretation. The only exception is for traits extracted from 2014 and 2015 ear imaging data, which are demarcated with ‘NA’.
For weather datasets, raw files reported by sensors are not provided because machine data were calibrated based on information from nearby weather stations to ensure accuracy (e.g., if the wind vane was set improperly, a calibration correction was required). Instead, only the cleaned version of the file is reported to reduce misinterpretation.
The geographic locations of field locations are not identical across years due to crop rotation management practices. Along with the field location code, the GPS coordinates are reported. While the germplasm used in the experiments is publicly accessible, it was not generated directly by national public genebanks. Seed access and availability are handled by the G2F collaborators directly.
Availability of data materials
The data described in this Data Note can be freely and openly accessed at CyVerse via the following Digital Object Identifiers (DOIs): https://www.doi.org/10.25739/frmv-wj25, https://www.doi.org/10.25739/9wjm-eq41, https://www.doi.org/10.25739/kjsn-dz84, https://www.doi.org/10.25739/yjnh-kt21, https://www.doi.org/10.25739/w560-2114 and https://doi.org/10.7946/P2C34P. See Table 1 and reference list for details and links to the data.
Genomes to fields
- G × E:
Digital Object Identifier
AlKhalifah N, et al. Maize genomes to fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets. BMC Res Notes. 2018;11:452.
Elshire RJ, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6(5):e19379.
Bradbury PJ, et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–5.
Pauli D, et al. The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol. 2016;172:622–34.
Genomes to ields. phenotyping handbook https://www.genomes2fields.org/about/project-overview/#standards-and-methods. Accessed 30 Aug 2019.
Miller ND, et al. A robust, high-throughput method for computing maize ear, cob, and kernel attributes automatically from images. Plant J. 2017;89:169–78.
Merchant N, et al. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016;14:e1002342.
G2F Consortium. G2F planting season 2014. CyVerse Data Commons. 2019. https://doi.org/10.25739/9wjm-eq41.
G2F Consortium. G2F planting season 2015. CyVerse Data Commons. 2019. https://www.doi.org/10.25739/kjsn-dz84.
G2F Consortium. G2F planting season 2016. CyVerse Data Commons. 2019. https://www.doi.org/10.25739/yjnh-kt21.
G2F Consortium. G2F planting season 2017. CyVerse Data Commons. 2019. https://www.doi.org/10.25739/w560-2114.
Spalding E. Genomes to fields inbred ear imaging 2017. CyVerse Data Commons. 2017. https://doi.org/10.7946/p2c34p.
Gage JL, et al. The effect of artificial selection on phenotypic plasticity in maize. Nat Commun. 2017;8:1348.
Lawrence-Dill C, et al. Idea factory: the maize genomes to fields initiative. Crop Sci. 2019;59(4):1406–10.
Anderson SL, et al. Prediction of maize grain yield before maturity using improved temporal height estimates of unmanned aerial systems. Plant Phenome J. 2019;2:190004.
Falcon CM, Kaeppler SM, Spalding EP, et al. Relative utility of agronomic, phenological, and morphological traits for assessing genotype-by-environment interaction in maize inbreds. Crop Sci. 2020. https://doi.org/10.1002/csc2.20035.
We gratefully acknowledge the data management training and transition contributions from Darwin A. Campbell, Jack M. Gardiner, Carolyn Lawrence-Dill and Renee Walton. We also acknowledge contributions from many field managers and data collectors including: Lisa Coffey (P. Schnable lab); Dustin Eilert, Marina Borsecnik, Rachel Perry, Emily Rothfusz, and Jane Petzoldt (de Leon/Kaeppler labs); Nick Lepak, Josh Budka, Nicholas Kaczmar, and Judy Kolkman (Cornell University); Miriam Lopez, Grace Kuehne, and Sarah Weirich (Lauter lab); Teclemariam Weldekidan (Wisser lab); Christine Smith (J. Schnable lab); Jacob Garfin, Amanda Gilbert and Thomas Hoverstad (Hirsch lab); Pete Hermanson (Springer lab); Nicole Yana (Bohn lab); Jacob Pekar (Texas A&M University); Susan Melia-Hancock (USDA-ARS, Columbia, MO); and Bill Widdicombe (Michigan State University). We also benefitted from data management discussions with Nicole Hopkins and Jeremy DeBarry (formerly with CyVerse); Kate Dreher, Clarissa Pimental, Julian Pietragalla, Jean-Marcel Ribaut, and Sarah Hearne (CIMMYT); Jan Erik Backlund and Kelly Robbins (Cornell University); and Matthew Berrigan (LeafNode).
We gratefully acknowledge support from: USDA Hatch program funds to multiple PIs in this project; the USDA Agricultural Research Service; the Arkansas Corn and Grain Sorghum Board; the Clemson University, the Colorado Corn Administrative Committee; the Georgia Agricultural Commodity Commission for Corn; the Corn Marketing Program of Michigan; the Illinois Corn Marketing Board; the Iowa Corn Promotion Board; the Iowa State University Plant Sciences Institute; the Kansas Corn Commission; the Minnesota Corn Research and Promotion Council; National Corn Growers Association; Nebraska Corn Board; the Ohio Corn Marketing Program; the Ontario Ministry of Agriculture, Food, and Rural Affairs; the Texas Corn Producers Board and the Wisconsin Corn Promotion Board. We also acknowledge funding from the National Science Foundation under Grant Numbers #DBI-0735191 and #DBI-1265383 to support CyVerse (http://www.cyverse.org), #IOS-1339362 to support phenotyping by JC and SPM, and USDA-NIFA 2011-67003-30342 to RJW, SFG, JH, NL, SM, WX, and NDL. The funders had no role in the design and conduct of the study, data collection, and writing of the manuscript.
Ethics approval and consent to participate
Consent for Publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
McFarland, B.A., AlKhalifah, N., Bohn, M. et al. Maize genomes to fields (G2F): 2014–2017 field seasons: genotype, phenotype, climatic, soil, and inbred ear image datasets. BMC Res Notes 13, 71 (2020). https://doi.org/10.1186/s13104-020-4922-8
- G × E
- Field metadata