Maize Genomes to Fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets

AlKhalifah, Naser; Campbell, Darwin A.; Falcon, Celeste M.; Gardiner, Jack M.; Miller, Nathan D.; Romay, Maria Cinta; Walls, Ramona; Walton, Renee; Yeh, Cheng-Ting; Bohn, Martin; Bubert, Jessica; Buckler, Edward S.; Ciampitti, Ignacio; Flint-Garcia, Sherry; Gore, Michael A.; Graham, Christopher; Hirsch, Candice; Holland, James B.; Hooker, David; Kaeppler, Shawn; Knoll, Joseph; Lauter, Nick; Lee, Elizabeth C.; Lorenz, Aaron; Lynch, Jonathan P.; Moose, Stephen P.; Murray, Seth C.; Nelson, Rebecca; Rocheford, Torbert; Rodriguez, Oscar; Schnable, James C.; Scully, Brian; Smith, Margaret; Springer, Nathan; Thomison, Peter; Tuinstra, Mitchell; Wisser, Randall J.; Xu, Wenwei; Ertl, David; Schnable, Patrick S.; De Leon, Natalia; Spalding, Edgar P.; Edwards, Jode; Lawrence-Dill, Carolyn J.

doi:10.1186/s13104-018-3508-1

Data note
Open access
Published: 09 July 2018

Maize Genomes to Fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets

Naser AlKhalifah¹^nAff23,
Darwin A. Campbell¹,
Celeste M. Falcon²,
Jack M. Gardiner¹^nAff24,
Nathan D. Miller²,
Maria Cinta Romay³,
Ramona Walls⁴,
Renee Walton¹,
Cheng-Ting Yeh¹,
Martin Bohn⁵,
Jessica Bubert⁵,
Edward S. Buckler^3,6,
Ignacio Ciampitti⁷,
Sherry Flint-Garcia^6,8,
Michael A. Gore³,
Christopher Graham⁹,
Candice Hirsch¹⁰,
James B. Holland^6,11,
David Hooker¹²,
Shawn Kaeppler²,
Joseph Knoll⁶,
Nick Lauter^1,6,
Elizabeth C. Lee¹³,
Aaron Lorenz¹⁴^nAff25,
Jonathan P. Lynch¹⁵,
Stephen P. Moose⁵,
Seth C. Murray¹⁶,
Rebecca Nelson³,
Torbert Rocheford¹⁷,
Oscar Rodriguez¹⁴,
James C. Schnable¹⁴,
Brian Scully^6,18,
Margaret Smith³,
Nathan Springer¹⁰,
Peter Thomison¹⁹,
Mitchell Tuinstra¹⁷,
Randall J. Wisser²⁰,
Wenwei Xu²¹,
David Ertl²²,
Patrick S. Schnable¹,
Natalia De Leon²,
Edgar P. Spalding²,
Jode Edwards^1,6 &
…
Carolyn J. Lawrence-Dill¹

BMC Research Notes volume 11, Article number: 452 (2018) Cite this article

4893 Accesses
19 Citations
9 Altmetric
Metrics details

Abstract

Objectives

Crop improvement relies on analysis of phenotypic, genotypic, and environmental data. Given large, well-integrated, multi-year datasets, diverse queries can be made: Which lines perform best in hot, dry environments? Which alleles of specific genes are required for optimal performance in each environment? Such datasets also can be leveraged to predict cultivar performance, even in uncharacterized environments. The maize Genomes to Fields (G2F) Initiative is a multi-institutional organization of scientists working to generate and analyze such datasets from existing, publicly available inbred lines and hybrids. G2F’s genotype by environment project has released 2014 and 2015 datasets to the public, with 2016 and 2017 collected and soon to be made available.

Data description

Datasets include DNA sequences; traditional phenotype descriptions, as well as detailed ear, cob, and kernel phenotypes quantified by image analysis; weather station measurements; and soil characterizations by site. Data are released as comma separated value spreadsheets accompanied by extensive README text descriptions. For genotypic and phenotypic data, both raw data and a version with outliers removed are reported. For weather data, two versions are reported: a full dataset calibrated against nearby National Weather Service sites and a second calibrated set with outliers and apparent artifacts removed.

Objective

G2F is a multi-institutional, collaborative initiative to develop tools that efficiently predict performance of diverse maize (Zea mays ssp. mays) varieties across multiple growing conditions. G2F projects aim to collect, share, and analyze multi-year, large-scale genomic, phenotypic, and environmental datasets. The project builds on existing maize genome sequence resources by developing approaches to understand the functions of genes and specific alleles based on their expression in typical field conditions. There are many dimensions to the goal of understanding genotype-by-environment (G × E) interactions, including which genes impact which traits and trait components, how genes interact among themselves, the relevance of specific genes under different growing conditions, and how genes influence plant growth during various stages of development.

G2F projects foster integration of diverse research disciplines, including (but not limited to) genetics, genomics, plant physiology, agronomy, climatology, and crop modeling as well as analytical perspectives and tools derived from computational sciences, statistics, and engineering. Under the umbrella of G2F are enterprises such as the G × E project that began in 2014. The G × E project aims to document and measure genotypes, phenotypes, and environmental data in standard formats across more than twenty distributed field locations in North America annually. The resulting dataset is unique as it represents, to our knowledge, the most extensive publicly available dataset of its kind, reporting a consistent set of traits across common sets of fully genotyped germplasm not only across many locations, but also with relevant information reported down to the level of specific plots. Making these datasets publicly available enables researchers from many different disciplines to tackle the daunting analyses necessary to make useful predictions of crop performance. Novel data analysis approaches and tools are expected to result from the curated and organized data described here.

Data description

Online forms were developed for logging field site coordinates, field management metadata, and other site-specific information. Datasets include:

DNA sequences of inbreds (with and without imputation), including those inbreds used to produce featured hybrids. The process for creating files and metadata pertaining to the genotype by sequencing (GBS) process [1] is described. Data are most readily analyzed using TASSEL software [2]. Raw sequence reads generated are accessible via the Sequence Read Archive [3].
Phenotype measurements for inbreds and hybrids. A handbook of instructions for making traditional phenotype measurements (reviewed in [4]) is available via the G2F website [5]. Traditional traits include stand count, stalk lodging, root lodging, days to anthesis, days to silking, ear height, plant height, plot weight, grain moisture, and test weight. Datatypes reported as both raw files and files with outliers removed are described in README files. Additionally, a large set of ear, cob, and kernel measurements was made with a non-traditional machine vision platform to quantify the components of yield [6]. These data are reported in millimeters with shape descriptors reported as principal components of contour data points. Cob color was reported as RGB (red/green/blue) pixel values. Kernel row number, counted manually, is reported as an integer.
Environmental data collected by WatchDog 2700 weather stations (Spectrum Technologies) at 30-min intervals from planting through harvest. Collected information includes wind speed, direction, and gust; air temperature, dewpoint, and relative humidity; rainfall; and solar radiation. Data are reported as a calibrated set (based on calibration derived from nearby National Weather Service stations) and “clean” (based on removing obvious artifacts from the calibrated dataset).
Soil characterizations by site (first taken in 2015) including plow depth, pH, buffered pH, organic matter, phosphorus levels (in parts per million), and potassium levels (in parts per million).

Data collected in year n are released to project members in spring of the following year (n + 1), and released to the public the subsequent year (n + 2). The 2014 and 2015 datasets are publicly available via the NCBI SRA [7] and CyVerse/iPlant [8] with files and access links shown in Table 1.

Table 1 Overview of data files and data sets

Full size table

As technologies develop and the number of researchers involved in the project grows, it is anticipated that increasingly diverse datatypes will be documented. An example of the use of these data has been reported [12]. In that study, phenotypic plasticity was found to be disproportionately controlled by regulatory regions. Because these datasets support lines of inquiry limited only by the questions researchers pose, the potential scope of application for these data is broad. The dataset is anticipated to additionally impact the field simply by being the first public dataset of its scale that has been collected and reported using standardized protocols and formats, respectively, thus defining standards for data collection, formatting, and access.

Limitations

Missing data occurs in most datasets. For genotypic and phenotypic datasets, missing data are left blank rather than zero or ‘null’ representation because some measured data report zero values and some software will only accept numeric values (not strings). The exception is for traits extracted from inbred ear, cob, and kernel image data, which are demarcated with ‘NA’.

In some instances, reported data were maintained rather than editing for consistency. These decisions were made to minimize misinterpretation that could lead to incorrect documentation or measurements.

For weather data, raw files reported by sensors are not provided because machine data were calibrated based on information from nearby weather stations to ensure accuracy (e.g., if the wind vane was set improperly, a calibration correction was required).

Field locations are not always identical year-to-year, primarily due to crop rotation management practices. Each field’s GPS coordinates are reported annually to enable data aggregation in keeping with specific research objectives.

Germplasm used and reported are specific to the project and are held by researchers involved in the project. They do not derive directly from national public genebanks. Seed access is granted in keeping with seed availability from cooperating researchers directly.

Abbreviations

G2F:: Genomes to Fields
G × E:: genotype by environment interaction
GBS:: genotyping by sequencing
RGB:: red/green/blue
DOI:: Digital Object Identifier

References

Elshire RJ, et al. A robust, simple genotyping-by-sequencing (GBS) approach for high diversity species. PLoS ONE. 2011;6(5):e19379.
Article PubMed PubMed Central CAS Google Scholar
Bradbury PJ, et al. TASSEL: software for association mapping of complex traits in diverse samples. Bioinformatics. 2007;23:2633–5.
Article PubMed CAS Google Scholar
Sornapudi T, Nayak R, Uppada V, Guthikonda PK, Kethavath S, Yellaboina S, Pasupulati AK, Kurukuti S. 2018: NCBI Sequence Read Archive. PRJNA385022.
Pauli D, et al. The quest for understanding phenotypic variation via integrated approaches in the field environment. Plant Physiol. 2016;172:622–34.
PubMed PubMed Central CAS Google Scholar
Genomes to Fields. phenotyping handbook https://www.genomes2fields.org/about/project-overview/#standards-and-methods. Accessed 1 Mar 2018.
Miller ND, et al. A robust, high-throughput method for computing maize ear, cob, and kernel attributes automatically from images. Plant J. 2017;89:169–78.
Article PubMed CAS Google Scholar
Leinonen R, Sugawara H. Shumway. The sequence read archive. Nucleic Acids Res. 2011;39(Database issue):D19–21.
Article PubMed CAS Google Scholar
Merchant N, et al. The iPlant collaborative: cyberinfrastructure for enabling data to discovery for the life sciences. PLoS Biol. 2016;14:e1002342.
Article PubMed PubMed Central CAS Google Scholar
Lawrence-Dill C. Genomes To Fields 2014. CyVerse Data Commons; 2016. https://doi.org/10.7946/p2v888.
Lawrence-Dill C. Genomes To Fields 2015. CyVerse Data Commons; 2017. https://doi.org/10.7946/p24s31.
Spalding E. Genomes to fields inbred ear imaging 2017. CyVerse Data Commons; 2017. https://doi.org/10.7946/p2c34p.
Gage JL, et al. The effect of artificial selection on phenotypic plasticity in maize. Nat Commun. 2017;8:1348.
Article PubMed PubMed Central CAS Google Scholar

Download references

Authors’ contributions

NA, DAC, CMF, JMG, NDM, MCR, RW, RW, CTY: data management team; MB, JB, ESB, IC, SFG, MAG, CG, CH, JBH, DH, SK, JK, NL, ECL, AL, JPL, SPM, SCM, RN, TR, OR, JCS, BS, MS, NS, PT, MT, RJW, WX: data contributors; DE, PSS, NL, EPS, JE, CJLD: communication. The data management team aggregated, curated, and made available data resources. Contributors advised on data collection methods, collected the data, and reviewed data collection and curation methods as well as datasets. Communicating authors wrote the manuscript and guided data collection, curation, and distribution. All authors reviewed the manuscript. All authors read and approved the final manuscript.

Acknowledgements

We gratefully acknowledge contributions from many field managers and data collectors including: Lisa Coffey (Schnable lab); Dustin Eilert, Marina Borsecnik, Emily Rothfusz, and Jane Petzoldt (De Leon lab); Nick Lepak, Josh Budka, and Nicholas Kaczmar (Cornell University); Miriam Lopez, Grace Kuehne, and Sarah Weirich (Lauter lab); Teclemariam Weldekidan (Wisser lab); Jacob Garfin and Amanda Gilbert (Hirsch lab), Pete Hermanson (Springer lab); Jacob Pekar (Texas A&M University); and Susan Melia-Hancock (USDA-ARS, Columbia, MO). We also benefitted from data management discussions with Nicole Hopkins and Jeremy DeBarry (formerly with CyVerse); Kate Dreher, Clarissa Pimental, Julian Pietragalla, Jean-Marcel Ribaut, and Sarah Hearne (CIMMYT); Jan Erik Backlund and Kelly Robbins (Cornell University); and Matthew Berrigan (LeafNode).

Competing interests

The authors declare that they have no competing interests.

Availability of data materials

The data described in this Data Note can be freely and openly accessed at the NCBI Sequence Read Archive via the identifier PRJNA385022 and at CyVerse via the following Digital Object Identifiers (DOIs): https://doi.org/10.7946/p2v888, https://doi.org/10.7946/p24s31, and https://doi.org/10.7946/p2c34p. See Table 1 and reference list for details and links to the data.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Not applicable.

Funding

We gratefully acknowledge support from: USDA Hatch program funds to multiple PIs in this project; the USDA Agricultural Research Service; the Iowa State University Plant Sciences Institute; the Ontario Ministry of Agriculture, Food, and Rural Affairs; the Illinois Corn Marketing Board; the Iowa Corn Promotion Board; the Kansas Corn Commission; the Minnesota Corn Research and Promotion Council; the Nebraska Corn Board; the Ohio Corn Marketing Program; the Texas Corn Producers Board; and the National Corn Growers Association. We also acknowledge funding from the National Science Foundation under Grant Numbers #DBI-0735191 and #DBI-1265383 to support CyVerse (http://www.cyverse.org) and USDA-NIFA 2011-67003-30342 to SFG, JH, NL, SM, RW, WX, and NDL.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Naser AlKhalifah
Present address: University of Wisconsin, Madison, WI, 53706, USA
Jack M. Gardiner
Present address: University of Missouri, Columbia, MO, 65211, USA
Aaron Lorenz
Present address: University of Minnesota, St. Paul, MN, 55108, USA

Authors and Affiliations

Iowa State University, Ames, IA, 50011, USA
Naser AlKhalifah, Darwin A. Campbell, Jack M. Gardiner, Renee Walton, Cheng-Ting Yeh, Nick Lauter, Patrick S. Schnable, Jode Edwards & Carolyn J. Lawrence-Dill
University of Wisconsin, Madison, WI, 53706, USA
Celeste M. Falcon, Nathan D. Miller, Shawn Kaeppler, Natalia De Leon & Edgar P. Spalding
Cornell University, Ithaca, NY, 14853, USA
Maria Cinta Romay, Edward S. Buckler, Michael A. Gore, Rebecca Nelson & Margaret Smith
University of Arizona, Tucson, AZ, 85721, USA
Ramona Walls
University of Illinois at Urbana-Champaign, Urbana, IL, 61801, USA
Martin Bohn, Jessica Bubert & Stephen P. Moose
USDA-ARS, Beltsville, MD, USA
Edward S. Buckler, Sherry Flint-Garcia, James B. Holland, Joseph Knoll, Nick Lauter, Brian Scully & Jode Edwards
Kansas State University, Manhattan, KS, 66502, USA
Ignacio Ciampitti
University of Missouri, Columbia, MO, 65211, USA
Sherry Flint-Garcia
South Dakota State University, Rapid City, SD, 57702, USA
Christopher Graham
University of Minnesota, St. Paul, MN, 55108, USA
Candice Hirsch & Nathan Springer
North Carolina State University, Raleigh, NC, 27695, USA
James B. Holland
University of Guelph, Ridgetown, ON, Canada
David Hooker
University of Guelph, Guelph, ON, Canada
Elizabeth C. Lee
University of Nebraska, Lincoln, NE, 68583, USA
Aaron Lorenz, Oscar Rodriguez & James C. Schnable
Pennsylvania State University, University Park, PA, 16802, USA
Jonathan P. Lynch
Texas A&M University, College Station, TX, 77843, USA
Seth C. Murray
Purdue University, West Lafayette, IN, 47907, USA
Torbert Rocheford & Mitchell Tuinstra
University of Florida, Gainesville, FL, 32611, USA
Brian Scully
Ohio State University, Columbus, OH, 43210, USA
Peter Thomison
University of Delaware, Newark, DE, 19716, USA
Randall J. Wisser
Texas A&M AgriLife Research, Lubbock, TX, 79403, USA
Wenwei Xu
Iowa Corn Growers Association, Johnston, IA, 50131, USA
David Ertl

Authors

Naser AlKhalifah
View author publications
You can also search for this author in PubMed Google Scholar
Darwin A. Campbell
View author publications
You can also search for this author in PubMed Google Scholar
Celeste M. Falcon
View author publications
You can also search for this author in PubMed Google Scholar
Jack M. Gardiner
View author publications
You can also search for this author in PubMed Google Scholar
Nathan D. Miller
View author publications
You can also search for this author in PubMed Google Scholar
Maria Cinta Romay
View author publications
You can also search for this author in PubMed Google Scholar
Ramona Walls
View author publications
You can also search for this author in PubMed Google Scholar
Renee Walton
View author publications
You can also search for this author in PubMed Google Scholar
Cheng-Ting Yeh
View author publications
You can also search for this author in PubMed Google Scholar
Martin Bohn
View author publications
You can also search for this author in PubMed Google Scholar
Jessica Bubert
View author publications
You can also search for this author in PubMed Google Scholar
Edward S. Buckler
View author publications
You can also search for this author in PubMed Google Scholar
Ignacio Ciampitti
View author publications
You can also search for this author in PubMed Google Scholar
Sherry Flint-Garcia
View author publications
You can also search for this author in PubMed Google Scholar
Michael A. Gore
View author publications
You can also search for this author in PubMed Google Scholar
Christopher Graham
View author publications
You can also search for this author in PubMed Google Scholar
Candice Hirsch
View author publications
You can also search for this author in PubMed Google Scholar
James B. Holland
View author publications
You can also search for this author in PubMed Google Scholar
David Hooker
View author publications
You can also search for this author in PubMed Google Scholar
Shawn Kaeppler
View author publications
You can also search for this author in PubMed Google Scholar
Joseph Knoll
View author publications
You can also search for this author in PubMed Google Scholar
Nick Lauter
View author publications
You can also search for this author in PubMed Google Scholar
Elizabeth C. Lee
View author publications
You can also search for this author in PubMed Google Scholar
Aaron Lorenz
View author publications
You can also search for this author in PubMed Google Scholar
Jonathan P. Lynch
View author publications
You can also search for this author in PubMed Google Scholar
Stephen P. Moose
View author publications
You can also search for this author in PubMed Google Scholar
Seth C. Murray
View author publications
You can also search for this author in PubMed Google Scholar
Rebecca Nelson
View author publications
You can also search for this author in PubMed Google Scholar
Torbert Rocheford
View author publications
You can also search for this author in PubMed Google Scholar
Oscar Rodriguez
View author publications
You can also search for this author in PubMed Google Scholar
James C. Schnable
View author publications
You can also search for this author in PubMed Google Scholar
Brian Scully
View author publications
You can also search for this author in PubMed Google Scholar
Margaret Smith
View author publications
You can also search for this author in PubMed Google Scholar
Nathan Springer
View author publications
You can also search for this author in PubMed Google Scholar
Peter Thomison
View author publications
You can also search for this author in PubMed Google Scholar
Mitchell Tuinstra
View author publications
You can also search for this author in PubMed Google Scholar
Randall J. Wisser
View author publications
You can also search for this author in PubMed Google Scholar
Wenwei Xu
View author publications
You can also search for this author in PubMed Google Scholar
David Ertl
View author publications
You can also search for this author in PubMed Google Scholar
Patrick S. Schnable
View author publications
You can also search for this author in PubMed Google Scholar
Natalia De Leon
View author publications
You can also search for this author in PubMed Google Scholar
Edgar P. Spalding
View author publications
You can also search for this author in PubMed Google Scholar
Jode Edwards
View author publications
You can also search for this author in PubMed Google Scholar
Carolyn J. Lawrence-Dill
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding authors

Correspondence to David Ertl, Patrick S. Schnable, Natalia De Leon, Edgar P. Spalding, Jode Edwards or Carolyn J. Lawrence-Dill.

Additional information

Naser AlKhalifah, Darwin A. Campbell, Celeste M. Falcon, Jack M. Gardiner, Nathan D. Miller, Maria Cinta Romay, Ramona Walls, Renee Walton, Cheng-Ting Yeh are joint first authors

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Cite this article

AlKhalifah, N., Campbell, D.A., Falcon, C.M. et al. Maize Genomes to Fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets. BMC Res Notes 11, 452 (2018). https://doi.org/10.1186/s13104-018-3508-1

Download citation

Received: 14 February 2018
Accepted: 18 June 2018
Published: 09 July 2018
DOI: https://doi.org/10.1186/s13104-018-3508-1

Maize Genomes to Fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets

Abstract

Objectives

Data description

Objective

Data description

Limitations

Abbreviations

References

Authors’ contributions

Acknowledgements

Competing interests

Availability of data materials

Consent for publication

Ethics approval and consent to participate

Funding

Publisher’s Note

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Keywords

BMC Research Notes

Contact us

Maize Genomes to Fields: 2014 and 2015 field season genotype, phenotype, environment, and inbred ear image datasets

Abstract

Objectives

Data description

Objective

Data description

Limitations

Abbreviations

References

Authors’ contributions

Acknowledgements

Competing interests

Availability of data materials

Consent for publication

Ethics approval and consent to participate

Funding

Publisher’s Note

Author information

Authors and Affiliations

Corresponding authors

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

BMC Research Notes

Contact us