Skip to main content

Table 4 Homopolymer exact match occurrences

From: Empirical assessment of sequencing errors for high throughput pyrosequencing data

Newbler

 

1

2

3

4

5

6

7

8

9

10

11

12

A

10112299

3780060

1394445

591882

250066

94484

27007

4569

498

61

10

9

C

9275676

1932907

270496

54226

8510

1239

175

6

0

0

0

0

G

9325002

1918715

273077

51062

7796

1026

115

10

0

0

0

0

T

10114600

3747686

1369516

584450

260939

101704

33958

6960

802

88

37

18

 

13

14

15

16

17

18

19

20

21

22

23

24

A

10

4

3

7

1

2

0

1

0

0

0

1

C

0

0

0

0

0

0

0

0

0

0

0

0

G

0

0

0

0

0

0

0

0

0

0

0

0

T

21

11

4

2

4

1

0

2

0

0

0

0

Total match positions: 80,998,946

Celera

 

1

2

3

4

5

6

7

8

9

10

11

12

A

8266460

3097760

1128372

492925

215277

82702

26304

4552

651

79

38

24

C

7527051

1547142

217848

43078

6786

876

113

0

0

0

0

0

G

7443144

1495949

212552

41896

6191

882

147

20

0

0

0

0

T

8322275

3084911

1146793

502731

224735

89882

29349

5634

423

55

32

25

 

13

14

15

16

17

18

19

20

21

22

  

A

15

12

16

5

1

1

3

0

0

1

  

C

0

0

0

0

0

0

0

0

0

0

  

G

0

0

0

0

0

0

0

0

0

0

  

T

48

12

11

8

6

2

2

1

0

0

  

Total match positions: 66,248,219

  1. Homopolymer exact match occurrences. The row indicates the base of the homopolymer and the column indicates its length. Thus for example, there were 7796 exact match alignments of GGGGG in the Newbler data set, and only 111 of TTTTTTTTTTTTTTT in the Celera assembler data set.