Acceleration of RMSD calculations. Four CPU implementations of different RMSD calculation algorithms, Quaternion Characteristic Polynomial (Q-CP), Quaternion Power (Q-P), Quaternion Jacobi (Q-J) and Rotational (Rot) were compared against our GPU implementation of the Q-J algorithm (GPU-Q-J). 6 different datasets from NRW, comprising of predictions of protein structures ranging from 70 to 140 residues in size were used as the test set. The time required to calculate the RMSDs for each pair of structures in the ensemble are displayed relative to the time required by the GPU implementation. Numbers on the right indicate the average for the 6 datasets. The results show that for ensembles of greater than 1000 structures, the overhead in setting up the GPU algorithm becomes negligible. There is a 260-fold increase in speed over the fastest CPU implementation. This increase in speed allows large ensembles of structures to be clustered quickly and relieves a major bottleneck in processing the results from NRW.