Skip to main content

Table 1 Summary of the datasets

From: Machine learning on normalized protein sequences

dataset

# sequences

positive samples

negative samples

length

APV

768

61%

39%

99.70 ± 1.24%

ATV

329

48%

52%

99.59 ± 1.06%

IDV

827

51%

49%

99.68 ± 1.23%

LPV

517

45%

55%

99.73 ± 1.22%

NFV

844

40%

60%

99.67 ± 1.22%

RTV

795

49%

51%

99.71 ± 1.24%

SQV

826

60%

40%

99.69 ± 1.23%

3TC

633

31%

69%

240.87 ± 2.33%

ABC

628

29%

71%

240.54 ± 4.20%

AZT

630

52%

48%

240.87 ± 2.33%

d4T

630

54%

46%

240.54 ± 4.20%

ddI

632

49%

51%

240.87 ± 2.33%

TDF

353

67%

33%

240.72 ± 1.88%

DLV

732

64%

36%

241.28 ± 1.49%

EFV

734

62%

38%

241.32 ± 1.49%

NVP

746

57%

43%

241.30 ± 1.48%

BVM

155

28%

72%

20.77 ± 2.07%

GTP

1435

46%

54%

232.18 ± 22.37%

MIP

49

39%

61%

261.41 ± 21.47%

  1. The table summarizes number of sequences within each dataset, percentages of positive and negative samples, average lengths ± standard deviations in percent.