Skip to main content
Fig. 1 | BMC Research Notes

Fig. 1

From: Shortcomings of SARS-CoV-2 genomic metadata

Fig. 1

The number of samples produced by each (a) “originating lab” and (b) “submitting lab” and the corresponding number of errors (or inconsistencies) for that respective lab. Color encodes the respective number of data points at a given position on the plot, with positions with fewer points shaded blue and positions with more points shaded red. c Some observed examples of misspellings, inconsistent naming conventions, and highly ambiguous entries. d A hypothetical phylogenetic tree displaying an example of a case in which errors in “originating lab” metadata might impede association studies with regard to SARS-CoV-2 genomic data. We denote true mutations with black dots and ambiguous mutations with red dots on the phylogeny. In this case, ambiguous “N” alleles occur multiple times across a phylogeny at a given site and all stem from the same lab. Metadata errors (shown in red) cause this ambiguous “N” allele to appear as if it is associated with 4 different labs (rather than 1). Such a site could impair phylogenetic inference and should be flagged in the SARS-CoV-2 masking recommendations but could be overlooked as a result of these errors [20, 24]

Back to article page