Skip to main content

Table 4 Homopolymer exact match occurrences

From: Empirical assessment of sequencing errors for high throughput pyrosequencing data

Newbler
  1 2 3 4 5 6 7 8 9 10 11 12
A 10112299 3780060 1394445 591882 250066 94484 27007 4569 498 61 10 9
C 9275676 1932907 270496 54226 8510 1239 175 6 0 0 0 0
G 9325002 1918715 273077 51062 7796 1026 115 10 0 0 0 0
T 10114600 3747686 1369516 584450 260939 101704 33958 6960 802 88 37 18
  13 14 15 16 17 18 19 20 21 22 23 24
A 10 4 3 7 1 2 0 1 0 0 0 1
C 0 0 0 0 0 0 0 0 0 0 0 0
G 0 0 0 0 0 0 0 0 0 0 0 0
T 21 11 4 2 4 1 0 2 0 0 0 0
Total match positions: 80,998,946
Celera
  1 2 3 4 5 6 7 8 9 10 11 12
A 8266460 3097760 1128372 492925 215277 82702 26304 4552 651 79 38 24
C 7527051 1547142 217848 43078 6786 876 113 0 0 0 0 0
G 7443144 1495949 212552 41896 6191 882 147 20 0 0 0 0
T 8322275 3084911 1146793 502731 224735 89882 29349 5634 423 55 32 25
  13 14 15 16 17 18 19 20 21 22   
A 15 12 16 5 1 1 3 0 0 1   
C 0 0 0 0 0 0 0 0 0 0   
G 0 0 0 0 0 0 0 0 0 0   
T 48 12 11 8 6 2 2 1 0 0   
Total match positions: 66,248,219
  1. Homopolymer exact match occurrences. The row indicates the base of the homopolymer and the column indicates its length. Thus for example, there were 7796 exact match alignments of GGGGG in the Newbler data set, and only 111 of TTTTTTTTTTTTTTT in the Celera assembler data set.