Skip to main content

Genomes to Fields 2022 Maize genotype by Environment Prediction Competition

Abstract

Objectives

The Genomes to Fields (G2F) 2022 Maize Genotype by Environment (GxE) Prediction Competition aimed to develop models for predicting grain yield for the 2022 Maize GxE project field trials, leveraging the datasets previously generated by this project and other publicly available data.

Data description

This resource used data from the Maize GxE project within the G2F Initiative [1]. The dataset included phenotypic and genotypic data of the hybrids evaluated in 45 locations from 2014 to 2022. Also, soil, weather, environmental covariates data and metadata information for all environments (combination of year and location). Competitors also had access to ReadMe files which described all the files provided. The Maize GxE is a collaborative project and all the data generated becomes publicly available [2]. The dataset used in the 2022 Prediction Competition was curated and lightly filtered for quality and to ensure naming uniformity across years.

Peer Review reports

Objective

The Maize GxE project is a collaborative effort that involves researchers from diverse areas of study. The datasets collected by the project are some of the largest public data of their kind and are therefore of broad interest to communities from genetics to agronomy to computer science and beyond. The competition was organized to connect these communities and others with interest in dissecting and exploring genotypic, environmental, and GxE information to predict hybrid maize performance in different environments across the US. The competition started on November 15, 2022, and ended on January 15, 2023. All the participants had access to the same curated data set, containing information collected on over 180,000 maize field plots and involving 4,683 hybrids. Participants were asked to create predictive models for maize grain yield for the 2022 Maize GxE project field trials, utilizing the existing Maize GxE project dataset and any other publicly available data. The trait of interest was grain yield, and the competitors were asked to submit absolute grain yield (Mg ha− 1) adjusted to 15.5% moisture for each hybrid in each location where data had been collected during the 2022 field season. The winner of the competition was the model with the lowest average root mean squared error (RMSE) across locations when compared with the actual yield data obtained in 2022.

Data description

The Prediction Competition data are publicly available via CyVerse/iPlant. This dataset contains training and testing set data and has been structured according to the specifications outlined in Table 1.

  • Training_data: includes phenotypic, genotypic, soil, weather (downloaded from https://power.larc.nasa.gov), environmental covariate data, and metadata information from 2014 to 2021 for use in developing and training models.

  • Testing_data: includes genotypic, soil, weather, environmental covariate data, and metadata information for 2022 locations. Also, a submission template that contains the environments and hybrids that participants used to submit yield predictions.

Maize is cultivated as a hybrid crop, typically resulting from the cross of two inbred parents. Consequently, both the phenotypic data in the training and testing sets exhibit hybrid information. The genotypic data includes hybrid information generated in-silico from inbred genotypic data.

Table 1 Overview of Genomes to Fields 2022 Maize Genotype by Environment Prediction Competition data files

Limitations

These datasets contain missing data. When working with large agricultural datasets, missing data is a common occurrence due to various factors such as data collection limitations, measurement errors, plot losses, and environmental events. The genotypic data provided contains hybrid information derived from inbred genotypic data, a common practice. However, depending on the study goals, this may pose limitations for specific types of analysis. In instances where precise GPS coordinates were not available for certain environments (i.e., a location in a particular year), field coordinates were estimated. Depending on the research objective, the unavailability of accurate GPS coordinates could impact the reliability of the results.

Data Availability

The data described in this Data note can be freely and openly accessed on CyVerse under https://doi.org/10.25739/tq5e-ak26 [3]. Please see Table 1 for details and links to the data.

Abbreviations

G2F:

Genomes to Fields

GxE:

Genotype by Environment

References

  1. Genomes to Fields. 2023. https://www.genomes2fields.org.

  2. Genomes to Fields resources. 2023. https://www.genomes2fields.org/resources.

  3. G2F Consortium. Genomes to Fields 2022 Maize Genotype by Environment Prediction Competition. CyVerse Data Commons. 2023. https://doi.org/10.25739/tq5e-ak26.

Download references

Acknowledgements

We gratefully acknowledge contributions from National Corn Growers Association, Iowa Corn Promotion Board, and USDA-ARS. The weather data was obtained from the National Aeronautics and Space Administration (NASA) Langley Research Center (LaRC) Prediction of Worldwide Energy Resource (POWER) Project funded through the NASA Earth Science/Applied Science Program.

Funding

We gratefully acknowledge support from: National Corn Growers Association, Iowa Corn Promotion Board, and USDA-ARS.

Author information

Authors and Affiliations

Authors

Contributions

DCL, JDW, JIV, QC, JLG, MCR, JH, DE, MLC, FMA, GDLC, SK, TB, MB, EB, JE, SFG, MAG, CNH, JEK, JM, RM, SCM, OAO, JCS, RSS, MPS, EES, AT, MT, JW, TW, WX, NDL were responsible for advising on data collection methods, collecting the data, reviewing data collection and curation methods, and the resulting datasets for the 2022 season. DCL, JDW, JIV, QC, JLG, MCR, JH, DE, NDL organized the Genomes to Fields (G2F) 2022 Maize Genotype by Environment Prediction Competition.

Corresponding author

Correspondence to Dayane Cristina Lima.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lima, D.C., Washburn, J.D., Varela, J.I. et al. Genomes to Fields 2022 Maize genotype by Environment Prediction Competition. BMC Res Notes 16, 148 (2023). https://doi.org/10.1186/s13104-023-06421-z

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s13104-023-06421-z

Keywords