From: Protein subfamily assignment using the Conserved Domain Database

Domain assignment among closely related domains. (A) Schematic of NCBI-curated hierarchy illustrating domain models from the Ribokinase/pfkB superfamily (cl00192). Each domain model is pictured as a multiple sequence alignment. Parent and child models are defined to share at least one (overlapping) sequence (blue, purple, and green lines). The root domain of this hierarchy represents the whole superfamily and provides information about the conserved core regions and sequence variation within the superfamily. Its many subfamilies include the ribokinase-like subgroups A and D and KdgK. Ideally, a query sequence is labelled by the most specific domain that matches the sequence and that domain would yield the most significant hit. (B) A partial list of domain hits to query protein [Entrez:BAA97341] with domain accessions, names, and the E-values of their RPS-BLAST alignments to the query sequence. Previously, BAA97341 would have been assigned the domain with lowest E-value, cd01942. Our proposed method assesses the hit to cd01942 against a pre-computed, domain-specific threshold to determine that the hit with lowest E-value is not significant enough to be a confident match. Instead, we label the sequence generically by the superfamily of the best-matching domain, or cl00192.

