Authentication of Aspergillus parasiticus strains in the genome database of the National Center for Biotechnology Information
BMC Research Notes volume 14, Article number: 111 (2021)
The use of genome sequences from strains authenticated to correct species level is a prerequisite for confidently exploring the evolutionary relationship among related species. Aspergillus strains erroneously curated as Aspergillus oryzae and Aspergillus fumigatus have been noticed in the National Center for Biotechnology Information (NCBI) genome database. Aspergillus parasiticus is one of several aspergilli that produce aflatoxin, the most potent carcinogenic mycotoxin known up to now. To ensure that valid conclusions are drawn by researchers from their genomics-related studies, molecular analyses were carried out to authenticate identities of A. parasiticus strains in the NCBI genome database.
Two of the nine supposedly A. parasiticus strains, E1365 and NRRL2999, were found to be misidentified. They turned out to be Aspergillus flavus based on genome-wide single nucleotide polymorphisms (SNPs) and genetic features associated with production of aflatoxin and cyclopiazonic acid. NRRL2999 lacked the additional partial aflatoxin gene cluster known to be present in its equivalent strain, designated as SU-1, and shared a very low total SNPs count specifically with A. flavus NRRL3357 but not with other A. flavus isolates. Therefore, the mislabeled NRRL2999 strain actually is a clonal strain of A. flavus NRRL3357, whose genome was first sequenced in 2005.
Aspergillus parasiticus is a saprophytic fungus found in soil and decayed plant materials. It was first isolated, from dead mealy bugs in Hawaiian sugarcane, and characterized by Speare in 1912 . A. parasiticus was originally classified as a subspecies of Aspergillus flavus, because of its morphological resemblance to A. flavus. Nonetheless, A. parasiticus can be distinguished from A. flavus based on other characteristics, such as darker green conidial heads and more rough conidium surface ornamentation .
Aflatoxin B1 is the most carcinogenic mycotoxin known . Nearly all A. parasiticus isolates are highly aflatoxigenic but abilities of A. flavus isolates to produce aflatoxin vary greatly. Evidence of the aflatoxin biosynthesis gene cluster, as well as its complete characterization, were revealed in A. parasiticus . Every known contributing enzyme to date and the majority of biosynthesis genes were first characterized in A. parasiticus . Since the early 2000s, A. flavus has gradually replaced A. parasiticus in studies dealing with population and genetic diversity, pathogen-host interactions, and biological control; however, A. parasiticus remains an important research subject because of its potential to co-infect crops such as corn and peanuts.
Rapid advancements in massive parallel sequencing in the genomic era have accelerated resolution of genome information. Genome sequences of over 100 A. flavus isolates and nine A. parasiticus strains have been made publicly available from NCBI (https://www.ncbi.nlm.nih.gov/genome). These genome assemblies, along with those from closely related aspergilli such as A. oryzae and A. sojae, are invaluable resources for comparative genomics studies involving the investigation of chromosomal structure, evolutionary relationship, and genetic variations in relation to aflatoxin-producing capabilities of aspergilli [6,7,8].
In the course of analyzing genome sequences deposited in the NCBI genome database, discrepancies were revealed for identities of some of the strains designated as A. parasiticus, necessitating a closer examination to (1) uncover the misidentified strains, (2) reveal their correct species identities, and (3) notify NCBI so that researchers would not risk erroneous conclusions from genomics studies involving these misidentified strains.
Materials and methods
Determination of genome-wide single nucleotide polymorphisms
Differentiation of A. parasiticus and A. flavus based on unique genetic features
Analyses of genetic features of genome sequences were performed via CoGe (Comparative Genomics, https://genomevolution.org/coge/), an online platform for retrieval and comparison of genomic information. For the determination of the norB-cypA deletion patterns in the aflatoxin gene cluster [10, 11], sequences corresponding to that region from A. parasiticus SU-1, and A. flavus NRRL3357 (L-morphotype) and AF12 (S-morphotype) were used as alignment templates for Aspergillus genome sequences. For the determination of aflatoxin and cyclopiazonic acid (CPA) gene clusters, respective accessioned gene clusters from A. flavus AF36 (GenBank Accession numbers: AY510455 and JN712209) were used in sequence alignment.
Results and discussion
Discrepancies in total SNPs counts among A. parasiticus strains
Ten A. parasiticus genome sequences, which supposedly were derived from eight independent isolates, are available from NCBI (Table 1). These sequences include five from Ethiopian peanut isolates , one from a Georgia, USA peanut isolate 68–5 , one from an isolate (CBS117618) collected from the leaf of an Argentinian wild peanut species (Arachis correntina) used for an Aspergillus whole-genus sequencing project , and three from the same isolate (SU-1 = NRRL2999) that had been independently sequenced by three groups because of the aforementioned significance in the aflatoxin biosynthesis research [15, 16]. Resolved genome sizes of these A. parasiticus strains range from 30.0 to 41.5 Mb, and most fall within a range of 38.0–40.0 Mb. The extraordinarily small size of strain 68–5 (> 20% less than others) likely resulted from poor library construction and/or an inadequate sequencing read coverage. Total SNPs among Ethiopian isolates, with the exception of E1365, ranged from 227,929 to 303,881 (Fig. 1). In contrast, total SNPs from E1365 compared with others were nearly 6- to eightfold higher, ranging from 1,851,399 to 1,858,062, which suggests that E1365 is not an A. parasiticus strain. The SU-1 genome, independently sequenced by two research groups and here designated as MSU and JCVI, had a total SNPs count of 3,202 (Fig. 1). This low number of nucleotide variations likely arises from a combination of sequencing errors, mutations accumulated over time, and subculturing. The total SNPs count from SU-1 compared with CBS117618 was comparable to those observed among the four Ethiopian isolates. A. parasiticus SU-1/NRRL2999 was originally isolated from a Ugandan peanut in 1961 . Over the past decades, this isolate has been given other strain designations, such as Austwick strain V. 3734/10, Hodges M-3, SYS-4, ATCC56775, ATCC26692, CMI91019b, NRRL5862, and SRRC143, depending on whether it was a transfer to another laboratory or was deposited into a culture collection center [18, 19]. Despite SU-1/NRRL2999 supposedly being the same strain, total SNPs from their comparisons, however, were 1,829,933 and 1,845,365. Total SNPs count is a good indicator for assigning isolates to species level. For example, total SNPs between A. flavus L-morphotype and S-morphotype isolates are about 300,000 while counts from the same morphotype isolates are much lower (20, also Fig. 2). Additionally, total SNPs for three A. nidulans strains were around 34,000 (data not shown). For 18 A. fumigants strains, there was a wide range of total SNPs observed although less than 170,000 (Additional file 1: Fig. S1). Therefore, these extraordinarily high counts of total SNPs, which represented approximately only 95.3% genome sequence identity, indicate that the deposited NRRL2999 is a completely different species from SU-1.
Lack of typical A. parasiticus genetic features in strains E1365 and NRRL2999
A. parasiticus generally can be differentiated from A. flavus based on macro- and micro-morphological characteristics. Moreover, A. parasiticus is unlike A. flavus in that it produces aflatoxins G1 and G2, in addition to B1 and B2, but it does not produce CPA. Molecular events underlying these differences in mycotoxin production have been well characterized. In A. flavus, the CPA biosynthesis gene cluster resides next to the aflatoxin biosynthesis gene cluster in a subtelomeric region on chromosome III, while in A. parasiticus the CPA gene cluster is mostly deleted . Additionally, two early pathway genes required for formation of G1 and G2, norB and cypA, are intact in A. parasiticus, but in A. flavus S- and L-morphotype strains, there are deletions in these genomic regions that render the strains incapable of G-type aflatoxin production [22, 23]. A schematic representation of the norB-cypA region in the aflatoxin gene clusters of A. parasiticus and A. flavus S- and L-morphotype isolates are shown (Additional file 1: Fig. S2). Both E1365 and NRRL2999 have the unique deletion belonging to L-morphotype strains; that is, the type II norB-cypA deletion (Table 1). Additionally, a complete 17-kb CPA gene cluster was located on contig SJFF01000023.1 of strain E1365 from nucleotides 1,918,903 to 1,935,762 and on Chromosome III of NRRL2999 (CP051029.1) from nucleotides 5,182,875 to 5,199,734, respectively. In contrast, no contigs of large portions homologous to the CPA gene cluster were found in either of the SU-1 genome sequences (Additional file 1: Fig. S3).
NRRL2999 lacks the partial duplicate aflatoxin gene cluster present in SU-1 and shares low total SNPs count with A. flavus NRRL3357
Another line of evidence arguing against NRRL2999 being an A. parasiticus strain came from its missing the SU-1 partial duplicate aflatoxin gene cluster that contains homologs of aflR-aflJ-adhA-estA-norA-ver1 and omtB . A near-duplicate copy of ver1 in A. parasiticus NRRL2999 (= SYS-4) also has been reported [25, 26]. Sequence alignment showed that in addition to the complete aflatoxin gene cluster, a large (14.6 kb) portion of the 17.4-kb genomic fragment that contains the partial aflatoxin gene cluster (GenBank Accession number: AF452809) was located on each of the two SU-1(JCVI) contigs, JZEE01000205.1 from nucleotides 1 to 11,696 and JZEE01000720.1 from nucleotides 5 to 2915, respectively. Similarly, various contigs of SU-1(MSU), the sizes of which were in total of 16.7 kb, were found beside its complete aflatoxin gene cluster. However, no sequence homologous to the partial aflatoxin gene cluster was present in the NRRL2999 genome sequence, which was assembled at the chromosome level, except for a complete aflatoxin gene cluster on chromosome III. NRRL2999 like E1365, shared similar total SNPs counts with other A. flavus S- and L-morphotype isolates except NRRL3357 (Fig. 2). This shared low total SNPs count indicates that the misidentified NRRL2999 strain is indeed a clone of A. flavus NRRL3357 .
Ugandan and Argentinian isolates of A. parasiticus, SU-1 and CBS117618, shared a total SNPs count of approximately 290,000 well within reasonable range for A. flavus. However, the four Ethiopian isolates and one of the American (68–5, taking into consideration of its 20% less than average genome size) isolates in comparison had total SNPs twice that count (approximately 580,000). Strikingly, the Ethiopian and American isolates shared total SNPs counts three times that of the Ugandan and Argentinian isolates. Whether the observed evolutionary distance in terms of total SNPs resulted from geographic separation and niche adaptation, or the Ethiopian and American isolates are not A. parasiticus but very closely-related aspergilli, is not clear. With regard to NRRL2999 and the genome sequence associated with it, the source of error, whether it was due to an erroneously provided stock culture, or a mix-up of sequencing samples of Aspergillus isolates, or a mislabeling of sequence read datasets, cannot be pinpointed.
Availability of data and materials
All major data generated or analyzed during this study are included in this article. Those not presented are available from the author on reasonable request. The following are links to NCBI Aspergillus genome databases used in this study. Aspergillus parasiticus genome database. https://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/12976/. Aspergillus flavus genome database. https://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/360/. Aspergillus nidulans genome database. https://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/17/. Aspergillus fumigatus genome database https://www.ncbi.nlm.nih.gov/genome/browse/#!/eukaryotes/18/. The GenBank Accession number for the norB-cypA region in A. parasiticus SU-1 is AY371490.1 (https://www.ncbi.nlm.nih.gov/nuccore/AY371490.1). The GenBank Accession numbers for aflatoxin and cyclopiazonic acid gene clusters of A. flavus AF36 are AY510455 (https://www.ncbi.nlm.nih.gov/nuccore/AY510455) and JN712209 (https://www.ncbi.nlm.nih.gov/nuccore/JN712209), respectively.
The National Center for Biotechnology Information
Single nucleotide polymorphism
J. Craig Venter Institute
Michigan State University
American Type Culture Collection
ARS (Agricultural Research Service) Culture Collection
Southern Regional Research Center Culture Collection
Dutch Centraalbureau voor Schimmelcultures
Commonwealth Mycological Institute
Speare AT. Fungi parasitic upon insects injurious to sugar cane. Pathol Physiol Series. 1912;12:62.
Klich MA, Pitt JI. Differentiation of Aspergillus flavus from A. parasiticus and other closely related species. Trans Brit Mycol Soc. 1988;91(1):99–108.
EFSA Panel on Contaminants in the Food Chain (CONTAM), Schrenk D, Bignami M, Bodin L, Chipman JK, Del Mazo J, Grasl-Kraupp B, Hogstrand C, Hoogenboom LR, Leblanc JC, Nebbia CS, Nielsen E, Ntzani E, Petersen A, Sand S, Schwerdtle T, Vleminckx C, Marko D, Oswald IP, Piersma A, Routledge M, Schlatter J, Baert K, Gergelova P, Wallace H. Risk assessment of aflatoxins in food. EFSA J. 2020;18(3):e06040.
Yu J, Chang P-K, Cary JW, Wright M, Bhatnagar D, Cleveland TE, Payne GA, Linz JE. Comparative mapping of aflatoxin pathway gene clusters in Aspergillus parasiticus and Aspergillus flavus. Appl Environ Microbiol. 1995;61(6):2365–71.
Yabe K, Nakajima H. Enzyme reactions and genes in aflatoxin biosynthesis. Appl Microbiol Biotechnol. 2004;64(6):745–55.
Chang P-K, Ehrlich KC. What does genetic diversity of Aspergillus flavus tell us about Aspergillus oryzae? Int J Food Microbiol. 2010;138(3):189–99.
Sato A, Oshima K, Noguchi H, Ogawa M, Takahashi T, Oguma T, Koyama Y, Itoh T, Hattori M, Hanya Y. Draft genome sequencing and comparative analysis of Aspergillus sojae NBRC4239. DNA Res. 2011;18(3):165–76.
Umemura M, Koike H, Yamane N, Koyama Y, Satou Y, Kikuzato I, Teruya M, Tsukahara M, Imada Y, Wachi Y, Miwa Y, Yano S, Tamano K, Kawarabayasi Y, Fujimori KE, Machida M, Hirano T. Comparative genome analysis between Aspergillus oryzae strains reveals close relationship between sites of mutation localization and regions of highly divergent genes among Aspergillus species. DNA Res. 2012;19(5):375–82.
Darling AC, Mau B, Blattner FR, Perna NT. Mauve: multiple alignment of conserved genomic sequence with rearrangements. Genome Res. 2004;14(7):1394–403.
Chang P-K, Ehrlich KC, Hua SS. Cladal relatedness among Aspergillus oryzae isolates and Aspergillus flavus S and L morphotype isolates. Int J Food Microbiol. 2006;108(2):172–7.
Ehrlich KC, Yu J, Cotty PJ. Aflatoxin biosynthesis gene clusters and flanking regions. J Appl Microbiol. 2005;99(3):518–27.
Arias RS, Mohammed A, Orner VA, Faustinelli PC, Lamb MC, Sobolev VS. Sixteen draft genome sequences representing the genetic diversity of Aspergillus flavus and Aspergillus parasiticus colonizing peanut seeds in Ethiopia. Microbiol Resour Announc. 2020;9(30):e00591-e620.
Faustinelli PC, Wang XM, Palencia ER, Arias RS. Genome sequences of eight Aspergillus flavus spp. and one A. parasiticus sp., isolated from peanut seeds in Georgia. Genome Announc. 2016;4(2):e00278-16.
Kjærbølling I, Vesth T, Frisvad JC, Nybo JL, Theobald S, Kildgaard S, Petersen TI, Kuo A, Sato A, Lyhne EK, Kogle ME, Wiebenga A, Kun RS, Lubbers RJM, Mäkelä MR, Barry K, Chovatia M, Clum A, Daum C, Haridas S, He G, LaButti K, Lipzen A, Mondo S, Pangilinan J, Riley R, Salamov A, Simmons BA, Magnuson JK, Henrissat B, Mortensen UH, Larsen TO, de Vries RP, Grigoriev IV, Machida M, Baker SE, Andersen MR. A comparative genomics study of 23 Aspergillus species from section Flavi. Nat Commun. 2020;11(1):1106.
Fountain JC, Clevenger JP, Nadon B, Wang H, Abbas HK, Kemerait RC, Scully BT, Vaughn JN, Guo B. Draft genome sequences of one Aspergillus parasiticus isolate and nine Aspergillus flavus isolates with varying stress tolerance and aflatoxin production. Microbiol Resour Announc. 2020;9(37):e00478-e520.
Linz JE, Wee J, Roze LV. Aspergillus parasiticus SU-1 genome sequence, predicted chromosome structure, and comparative gene expression under aflatoxin-inducing conditions: evidence that differential expression contributes to species phenotype. Eukaryot Cell. 2014;13(8):1113–23.
Hesseltine CW, Shotwell OL, Ellis JJ, Stubblefield RD. Aflatoxin formation by Aspergillus flavus. Bacteriol Rev. 1966;30(4):795–805.
Bennett JW, Fernholz FA, Lee LS. Effect of light on aflatoxins, anthraquinones, and sclerotia in Aspergillus flavus and A. parasiticus. Mycologia. 1978;70(1):104–16.
Spencer Smith J, Paul William W, Windham GL. Aflatoxin in maize: a review of the early literature from “moldy-corn toxicosis” to the genetics of aflatoxin accumulation resistance. Mycotoxin Res. 2019;35(2):111–28.
Chang P-K. Genome-wide nucleotide variation distinguishes Aspergillus flavus from Aspergillus oryzae and helps to reveal origins of atoxigenic A. flavus biocontrol strains. J Appl Microbiol. 2019;127(5):1511–20.
Chang P-K, Horn BW, Dorner JW. Clustered genes involved in cyclopiazonic acid production are next to the aflatoxin biosynthesis gene cluster in Aspergillus flavus. Fungal Genet Biol. 2009;46(2):176–82.
Ehrlich KC, Chang P-K, Yu J, Cotty PJ. Aflatoxin biosynthesis cluster gene cypA is required for G aflatoxin formation. Appl Environ Microbiol. 2004;70(11):6518–24.
Ehrlich KC, Scharfenstein LL Jr, Montalbano BG, Chang P-K. Are the genes nadA and norB involved in formation of aflatoxin G1? Int J Mol Sci. 2008;9(9):1717–29.
Chang P-K, Yu J. Characterization of a partial duplication of the aflatoxin gene cluster in Aspergillus parasiticus ATCC 56775. Appl Microbiol Biotechnol. 2002;58(5):632–6.
Kusumoto K, Mori K, Nogata Y, Ohta H, Manabe M. Homologs of an aflatoxin biosynthetic gene ver-1 in the strains of Aspergillus oryzae and its related species. J Ferment Bioeng. 1996;82(2):161–4.
Liang SH, Skory CD, Linz JE. Characterization of the function of the ver-1A and ver-1B genes, involved in aflatoxin biosynthesis in Aspergillus parasiticus. Appl Environ Microbiol. 1996;62(12):4568–75.
Nierman WC, Yu J, Fedorova-Abrams ND, Losada L, Cleveland TE, Bhatnagar D, Bennett JW, Dean R, Payne GA. Genome sequence of Aspergillus flavus NRRL 3357, a strain that causes aflatoxin contamination of food and feed. Genome Announc. 2015;3(2):e00168-e215.
The author is grateful to the anonymous reviewer for carefully reading and editing the manuscript.
Ethics approval and consent to participate
Consent for publication
The author declares no conflict of interest.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
. Total SNPs from paired genome sequence comparisons among 18 A. fumigatus isolates. Fig. S2. Schematic representation of the norB-cypA region in the aflatoxin gene cluster of A. parasiticus and A. flavus. Arrows indicate direction of gene transcription. Dashed lines indicate deleted sequences. Deletion patterns, type I and type II, correspond to A. flavus S- and L-morphotype isolates. Fig. S3. CoGeBlast of the CPA gene cluster sequence of A. flavus AF36 against genome sequences of E1365, NRRL2999, and SU-1.
About this article
Cite this article
Chang, PK. Authentication of Aspergillus parasiticus strains in the genome database of the National Center for Biotechnology Information. BMC Res Notes 14, 111 (2021). https://doi.org/10.1186/s13104-021-05527-6