Skip to main content

Table 2 Dataset impact on model accuracy

From: DivA: detection of non-homologous and very divergent regions in protein sequence alignments

MSAs dataset

TPR

FDR

PPV

50 only-bird

0.7970402

0.6267327

0.3732673

100 only-bird

0. 812071

0. 4976526

0. 5023474

200 only-bird

0. 810281

0. 3775697

0. 6224303

200 all species

0. 469429

0. 5588211

0. 4411789

  1. The table shows the efficiency tests results on different datasets with different sizes (50 MSAs, 100, and 200) and divergence (only birds, and birds plus distant species). True positives (TP) correspond to the number of alignment positions included in outlier windows by DivA that were also detected to be outlier by the manual annotation. False positives (FP) are located within outlier windows but were not contemplated in the manual annotation. False negatives (FN) were manually annotated as outlier, but were not detected by DivA as such. True negatives (TN) are absent in windows annotated as outlier both manually and using DivA. TPR: true positive rate, FDR: false discovery rate, PPV: positive predictive value.