Skip to main content
Figure 3 | BMC Research Notes

Figure 3

From: HMM-ModE: implementation, benchmarking and validation with HMMER3

Figure 3

Schematic flow of how HMM-ModE works on a set of pre-classified protein family sequences. The figure shows how the method HMM-ModE works, a set of pre-classified sequences (functionally classified) are used which are clustered using MCL in order to obtain clusters of similar sequences. These clusters are aligned separately and HMM profiles are built using ‘hmmbuild’ from HMMER package, these are known as true positive (TP) profiles. The TP profiles are scanned against all the sequences, ideally the profile should pick sequences belonging to the same family but it always picks up sequences belonging to other families as well due to fold specific signals shared across families. We call these as false positive (FP) sequences and generate FP HMM profiles from them. If the number of FPs is greater than 200 then we perform random sampling and then pick a representative set of 200 sequences to generate the FP profile. Both the TP and FP profiles are then aligned using profile-profile alignment from MUSCLE and this alignment is then used to identify the discriminating residues and modify the corresponding emission probabilty of the TP profile. A 10-fold cross validation is also done to identify a discriminating threshold and we use the modified profiles, known as the HMM-ModE profiles, with modified emission probability and the defined discriminating threshold.

Back to article page