The health informatics cohort enhancement project (HICE): using routinely collected primary care data to identify people with a lifetime diagnosis of psychotic disorder
© Economou et al; licensee BioMed Central Ltd. 2012
Received: 21 October 2011
Accepted: 14 February 2012
Published: 14 February 2012
We have previously demonstrated that routinely collected primary care data can be used to identify potential participants for trials in depression . Here we demonstrate how patients with psychotic disorders can be identified from primary care records for potential inclusion in a cohort study. We discuss the strengths and limitations of this approach; assess its potential value and report challenges encountered.
We designed an algorithm with which we searched for patients with a lifetime diagnosis of psychotic disorders within the Secure Anonymised Information Linkage (SAIL) database of routinely collected health data. The algorithm was validated against the "gold standard" of a well established operational criteria checklist for psychotic and affective illness (OPCRIT). Case notes of 100 patients from a community mental health team (CMHT) in Swansea were studied of whom 80 had matched GP records.
The algorithm had favourable test characteristics, with a very good ability to detect patients with psychotic disorders (sensitivity > 0.7) and an excellent ability not to falsely identify patients with psychotic disorders (specificity > 0.9).
With certain limitations our algorithm can be used to search the general practice data and reliably identify patients with psychotic disorders. This may be useful in identifying candidates for potential inclusion in cohort studies.
The expanding area of health informatics looks to make the best use of the rich sources of clinical information housed in electronic databases. In the area of mental health, recruitment to trials and cohorts can be particularly challenging, but we have previously shown that routinely collected, digitally stored, clinical data from primary care can be used to identify potential participants for trials in depression . We now seek to extend this technique to psychiatric cohort studies by identifying patients with psychotic disorders. The design of an electronic cohort of patients with psychotic disorders in tandem with a traditional cohort of patients could lead to more powerful longitudinal studies and a more complete study of the aetiology, prognostic indicators and treatment response of psychotic disorders.
To examine the algorithm's ability to correctly identify patients with psychotic disorders compared to the 'gold standard' diagnosis generated by OPCRIT ; and
To determine whether anonymised routinely collected primary care data can be used to accurately identify patients with psychotic disorders for participation in a cohort study.
The patient sample was taken from the population of a Community Mental Health Team (CMHT) in Swansea. JM generated a list of 200 random numbers, in a range of 1 to 500, using SPSS software. The random numbers were used by ST, VP and AE to select individual paper case notes, for inclusion in the study.
The SAIL database is run by the Health Informatics Research Unit (HIRU) at Swansea University . HIRU has a protocol in place with National Health Service Wales Informatics Service (NWIS) to ensure that all data are anonymised. This has been achieved through a split file approach to data management. The demographic data are separated from the clinical data by the source organisation and a system linking field is used to ensure that the data can be rejoined later. The demographic data are sent to NWIS and the clinical data are sent to HIRU. NWIS use encryption technology for pseudo-anonymisation, replacing the personal data in each record with an Anonymous Linking Field (ALF). This product is then transferred to HIRU where it is joined to the clinical data via the system linking field. As a final safeguard, HIRU further encrypts the ALF, thus ensuring that no single organisation can decrypt the records. This split file method ensures that anonymity and confidentiality is maintained, whilst maintaining the facility of data linkage at the individual level. The data are then ready for research applications. Only the source organisation (i.e. the treating physician) has access to both personal and clinical data. The data are provided to the SAIL database on the grounds that they are never de-anonymised and therefore patient records can never be traced back to individual patients . SAIL is a growing databank of linked data used to support research. It currently contains anonymised GP data on about a million people from 150 practices. The OPCRIT data were anonymised and linked to GP data.
The SAIL project conforms to the HIRU Data Anonymisation Policy and Process (DAPP), which takes account of the requirements of the Data Protection Act (1998), the Principles of the Caldicott report (1997) and measures that embody good information governance. The DAPP has been endorsed by Informing Healthcare and the Corporate Health Information Programme (CHIP) and has been reviewed by Caldicott Guardians and Information Governance Officers in the NHS and Local Government. The HICE project was exempted from further ethical approval by South West Wales Research Ethics Committee in July 2008.
Quality Outcomes Framework read codes used for diagnosis by General Practitioners
QOF read code
Manic disorder, single episode
Recurrent manic episodes
Single major depressive episode, severe, with psychotic disorders
Recurrent major depressive episodes, severe, with psychotic disorders
Bipolar affective disorder, currently manic
Bipolar affective disorder, currently depressed
Mixed bipolar affective disorder
Unspecified bipolar affective disorder
Other and unspecified manic-depressive psychoses
Unspecified manic-depressive psychoses
Atypical manic disorder
Other mixed manic-depressive psychoses
Other and unspecified manic-depressive psychoses NOS
Other and unspecified affective psychoses
Unspecified affective psychoses NOS
Other affective psychotic disorders NOS
Other nonorganic psychoses
Reactive depressive psychotic disorders
Acute hysterical psychotic disorders
Acute paranoid reaction
Psychogenic paranoid psychotic disorders
Other reactive psychoses
Brief reactive psychotic disorders
Other reactive psychoses NOS
Nonorganic psychotic disorders NOS
[X]Schizophrenia, schizotypal and delusional disorders
[X]Bipolar affective disorder
Severe depressive episode with psychotic symptoms
[X] Recurrent depressive disorder, current episode severe with psychotic symptoms
The operational criteria checklist for psychotic and affective illness (OPCRIT) was used to provide a 'gold standard' diagnosis for the 100 patients whose case notes were examined. OPCRIT is a diagnostic system which comprises a checklist of 90 items, constructed from operational criteria for the major psychiatric classifications and a suite of computer programmes that allows data to be entered from patients' case notes. Once the data have been loaded into OPCRIT, diagnoses are generated according to different classification systems .
International Classification of Diseases 10th Revision codes used for OPCRIT diagnosis
Persistent delusional disorders
Acute and transient psychotic disorders
Induced delusional disorder
Other non-organic psychotic disorders
Unspecified non-organic psychotic disorders
Bipolar affective disorder
Severe depressive episode with psychotic symptoms
Recurrent depressive disorder, current episode severe with psychotic symptoms
Persistent mood [affective] disorders
Other mood [affective] disorders
Unspecified mood [affective] disorder
In order to check the reliability of the algorithm in identifying patients with psychotic disorders, the diagnoses generated by OPCRIT in our patient sample were compared with the diagnoses produced by running the algorithm for the same group of patients in the SAIL database.
The data were analysed using The Statistical Package for the Social Sciences (SPSS) version 19. In assessing the reliability of the algorithm the characteristics assessed were: sensitivity (true positive rate), specificity (true negative rate), prevalence (pre-test likelihood of disease), predictive value of positive test (post-test likelihood of disease), and predictive values of negative test (post-test likelihood of no disease), likelihood ratio of a positive result, likelihood ratio of a negative result and the diagnostic odds ratio.
OPCRIT results for 51 patients with psychotic disorders
(n = 21)
(n = 5)
Bipolar affective disorder
(n = 6)
Persistent delusional disorder
(n = 3)
Severe depressive episode with psychotic disorders
(n = 2)
(n = 2)
Other non organic psychotic disorder
(n = 12)
Of the remaining 49 patients, 33 met ICD-10 criteria for non-psychotic mental disorders; the remaining 16 had insufficient clinical information in their case notes to complete all 90 items in OPCRIT in order to generate a diagnosis. These 16 were omitted from the analysis.
Clinical information was stored in the general practice database (GPDB) in SAIL for 80 of the above 100 patients. The 20 patients who belonged to practices that were not currently supplying SAIL with data were omitted from the analysis.
Two by Two table comparing diagnosis of psychotic data using algorithm derived from General Practice Data compared to gold standard
Psychotic disorder diagnosis
Health Informatics Cohort Enhancement (HICE) Algorithm characteristics
Value (95% Confidence Interval)
Predictive value of positive test
Predictive value of negative test
Likelihood ratio of positive test
Likelihood ratio of negative test
Diagnostic odds ratio
Further analysis was undertaken to investigate the reasons for the incorrect cases.
One false positive was identified as the patient had a QOF psychotic disorders code in their GP data along with a number of other mental health diagnoses. In the false negative group, none had a psychotic disorders code of any description.
In this study, we built an algorithm and subsequently examined its performance in identifying patients with psychotic disorders, by searching primary care data.
We were able to construct an algorithm to search electronic databases of routinely collected primary care clinical data. The algorithm had very promising characteristics when evaluated against the 'gold standard' of OPCRIT diagnosis. It combined a very good ability to detect patients with psychotic disorders (true positives), with an excellent ability not to incorrectly identify patients who do not have psychotic disorders (true negatives). The other test characteristics included an excellent ability to minimise the number of patients without psychotic disorders who tested positive (false positives) and a very good ability to minimise the number of patients identified as not having psychotic disorders when in fact they did (false negatives). The study suggests that routinely collected primary care data can be used to accurately identify patients with psychotic disorders for participation in a cohort study
Comparison with previous research
Previous research has demonstrated that general practitioners accurately document psychotic illness in their computer records and that general practice computer records are reliable for research purposes [7, 8]. We have previously shown that that routinely collected data in primary care can be used to identify patients suffering with depression for potential inclusion in a clinical trial and described how that data can then be de-anonymised by the treating team without compromising patient confidentiality . The present study demonstrates that an electronic algorithm built to search databanks of clinical information, entered by general practitioners during patient consultations, performs well in identifying patients with a lifetime diagnosis of a psychotic disorder.
Twenty out of the original 100 patients whose case notes were assessed using OPCRIT did not have clinical information stored in the GP data within SAIL, as they were registered to practices who were not currently supplying data to SAIL, limiting the precision of findings
The algorithm used Quality and Outcome Framework (QOF) Read codes used by general practitioners to document a diagnosis of psychotic disorders. The QOF list of read codes for psychotic disorders appears to be fairly comprehensive; all that are omitted are organic psychoses, psychotic disorders with origins in childhood, seasonal affective disorder, rebound mood swings and some depression codes. Codes that explicitly state depression with psychotic symptoms were included in the QOF. A more modified algorithm could have identified patients with further Read codes, including those regarding prescription of psychotropic medication used in the treatment of patients suffering with psychotic disorders, such as antipsychotics. Of course, antipsychotic medication is prescribed for a variety of clinical presentations and not only for patients with psychotic disorders. Such an alteration to the algorithm would likely have increased the ability to identify patients with psychotic disorders (improved sensitivity) at the expense of perhaps falsely identify patients as having psychotic disorders (reduced specificity) when in fact they had been prescribed psychotropic medication for treatment of clinical presentations other than psychotic disorders. In this event, the diagnostic test would have increased sensitivity but also reduced specificity, as well as reduced positive predictive value. The possibility that OPCRIT diagnosis may be sub-optimal and hence not a gold standard must also be considered. 16 out of 100 paper case notes examined did not include enough clinical information for all items in OPCRIT to be completed. However, this is a limitation inherent in comprehensiveness of clinical notes rather than a limitation of OPCRIT. We also acknowledge that the prevalence of psychosis in the CMHT population is higher than in the community population and this may impact upon the positive predictive value of our algorithm. Thus, further research is needed.
The algorithm designed to search routinely collected primary data in UK primary care databases PDB can reliably be used to identify patients with psychotic disorders. This will enable researchers to easily identify a large number of patients with psychotic disorders and may be an important tool in trial recruitment. It is also a promising development in the efforts to create population based electronic cohort of patients with psychotic disorders. Further research is needed to test this approach in other disorders.
Anonymous linking field
Corporate health information programme
Centre for health information research and evaluation
Community mental health team
Data anonymisation policy and process
General practice database
Health informatics research unit
International classification of diseases 10th revision (ICD 10)
National institute for social care and health research
National health service wales informatics service
Operational criteria checklist for psychotic and affective illness
Quality and outcomes framework
Secure anonymised information linkage
The statistical package for the social sciences
Structured query language.
This study was funded by a grant from the Welsh Government's National Institute for Social Care and Health Research (NISCHR). The study makes use of anonymised data held in the Secure Anonymised Information Linkage (SAIL) system which is part of the national e-health records research infrastructure for Wales. We would like to acknowledge all the data providers who make anonymised data available for research.
- McGregor J, Brooks C, Chalasani P, Chukwuma J, Hutchings H, Lyons RA, Lloyd K: The health informatics trial enhancement project (HITE): using routinely collected primary care data to identify potential participants for a depression trial. Trials. 2010, 11: 39-10.1186/1745-6215-11-39.PubMedPubMed CentralView ArticleGoogle Scholar
- Ford DV, Jones KH, Verplancke JP, Lyons RA, John G, Brown G, Brooks CJ, Thompson S, Bodger O, Couch T, Leake K: The SAIL databank: building a national architecture for e-health research and evaluation. BMC Health Serv Res. 2009, 9: 157-10.1186/1472-6963-9-157.PubMedPubMed CentralView ArticleGoogle Scholar
- Lyons RA, Jones KH, John G, Brooks CJ, Verplancke JP, Ford DV, Brown G, Leake K: The SAIL databank: linking multiple health and social care datasets. BMC Med Inform Decis Mak. 2009, 9: 3-10.1186/1472-6947-9-3.PubMedPubMed CentralView ArticleGoogle Scholar
- Structured Query Language. [http://db.grussell.org/sql1.html]
- Quality and outcomes framework. [http://www.nhsemployers.org/PayAndContracts/GeneralMedicalServicesContract/QOF/Pages/QualityOutcomesFramework.aspx]
- McGuffin P, Farmer A, Harvey I: A polydiagnostic application of operational criteria in studies of psychotic illness. Development and reliability of the OPCRIT system. Arch Gen Psychiatry. 1991, 8: 764-770.View ArticleGoogle Scholar
- Nazareth I, King M, Haines A, Rangel L, Myers S: Accuracy of diagnosis of psychotic disorders on general practice computer system. Br Med J. 1993, 307: 32-34. 10.1136/bmj.307.6895.32.View ArticleGoogle Scholar
- Jick H, Jick SS, Derby LE: Validation of information recorded on general practitioner based computerised data resource in the United Kingdom. Br Med J. 1991, 302: 766-768. 10.1136/bmj.302.6779.766.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.