Skip to main content

Table 5 The dataset

From: Mutual information and variants for protein domain-domain contact prediction

Dataset
Protein D1 D2 Sequences Species
1A45 1 82 83 173 160 E(146)N(14)
1BIB 67 270 271 317 236 A(12)B(201)N(23)
1BKS 1 188 189 268 478 A(21)B(401)E(10)N(46)
1FNB 19 152 153 314 58 B(22)E(34)N(2)
1G8A 1 51 52 227 75 A(47)E(20)N(8)
1G8P 18 216 261 350 230 A(10)B(143)E(49)N(28)
1I39 1 158 159 200 688 A(32)B(538)E(7)V(1)U(1)N(109)
1J5X 2 169 170 319 252 A(9)B(183)E(5)N(55)
1LAP 1 147 148 484 454 A(2)B(331)E(84)N(37)
1LLD 7 148 149 319 709 A(33)B(389)E(221)N(66)
1MRI 1 162 163 246 68 B(2)E(65)N(1)
1PII 1 255 256 452 75 B(65)N(10)
1RHD 1 156 157 293 505 A(26)B(365)E(57)U(1)N(56)
1THM 1 127 128 208 106 A(1)B(62)E(34)N(9)
1W98 88 227 228 357 70 E(64)N(6)
1WRU 3 176 177 346 64 B(58)V(2)N(4)
1X2G 1 246 247 337 224 A(2)B(155)E(42)N(25)
2AAA 1 376 377 484 245 B(141)E(74)N(30)
2AHE 16 108 109 253 144 B(25)E(100)N(19)
2D3V 3 95 96 195 77 E(71)N(6)
2D8N 16 97 102 189 240 E(195)N(45)
2E64 1 188 189 235 294 A(9)B(231)E(4)U(1)N(49)
2I00 10 300 301 406 116 A(2)B(80)N(34)
2IU5 1 71 72 180 65 B(56)N(9)
2NPO 3 76 77 188 224 A(3)B(182)U(1)N(38)
2NRC 1 247 261 480 188 A(9)B(96)E(68)N(15)
2OF7 17 67 68 207 204 B(135)N(69)
2OI8 8 86 87 216 215 B(151)N(64)
2PGD 1 172 178 433 317 B(211)E(78)N(28)
2PGE 3 136 137 368 138 A(6)B(102)E(1)N(29)
2PGX 2 56 57 250 102 B(87)N(15)
2PHZ 20 142 143 296 420 A(4)B(343)N(73)
2QY9 201 284 285 495 471 A(32)B(344)E(15)N(80)
2REB 23 268 269 328 482 B(434)E(12)N(36)
2TS1 1 220 248 319 598 B(512)E(34)N(52)
4ENL 1 126 127 436 649 A(32)B(448)E(122)N(47)
4MDH 1 154 155 333 339 A(6)B(173)E(134)N(26)
5FBP 1 201 202 335 355 A(3)B(213)E(112)N(27)
6GST 1 82 90 217 374 B(10)E(312)N(52)
8TLN 1 135 136 316 44 A(1)B(36)E(2)N(5)
  1. The “protein” column contains a list of pdb identifiers[40]. D1 and D2 columns denote the start and end pdb residues of domains 1 and 2, respectively. For all pdbs listed, the start and end residues are located in chain A of the structure, except for pdb 1W98 where the mentioned domains are in chain B, and pdb 8TLN in chain E. The “sequences” column indicates the number of sequences present in the multiple sequence alignment (MSA). The final column states the distribution of sequences in each MSA taken from the various species’ domains: eukaryotes (E); archea (A); bacteria (B); viruses (V); unclassified (U); and not found (N), i.e. those sequences that could not be found in the NCBI Taxonomy Database. This dataset was taken from Hamer et al.[12].
\