One consensual depression diagnosis tool to serve many countries: a challenge! A RAND/UCLA methodology

Objective From a systematic literature review (SLR), it became clear that a consensually validated tool was needed by European General Practitioner (GP) researchers in order to allow multi-centred collaborative research, in daily practice, throughout Europe. Which diagnostic tool for depression, validated against psychiatric examination according to the DSM, would GPs select as the best for use in clinical research, taking into account the combination of effectiveness, reliability and ergonomics? A RAND/UCLA, which combines the qualities of the Delphi process and of the nominal group, was used. GP researchers from different European countries were selected. The SLR extracted tools were validated against the DSM. The Youden index was used as an effectiveness criterion and Cronbach’s alpha as a reliability criterion. Ergonomics data were extracted from the literature. Ergonomics were tested face-to-face. Results The SLR extracted 7 tools. Two instruments were considered sufficiently effective and reliable for use: the Hospital Anxiety and Depression Scale and the Hopkins Symptoms Checklist-25 (HSCL-25). After testing face-to-face, HSCL-25 was selected. A multicultural consensus on one diagnostic tool for depression was obtained for the HSCL-25. This tool will provide the opportunity to select homogeneous populations for European collaborative research in daily practice.

• Provide a simple and effective diagnostic tool that allows medical research in daily practice. • Gain consensus on the tool's use irrespective of nationality.
For medical research, there are common selection criteria: efficiency, reliability and ergonomics. The tool must be consensually accepted by researchers and have face validity. It must be validated to indicate when psychiatric referral is required and should be accepted by both psychiatrists and General Practitioners (GPs) [6,7]. Under the auspices of the European General Practice Research Network (EGPRN), European GP researchers decided to find such a tool. Experts representing different cultures, languages and health systems sought consensus [6,8].
Seven tools were found using a systematic literature review. They needed to be validated against a psychiatric examination using the DSM's major depression criteria, usable in primary care research and conceptually understandable by GPs and psychiatrists [9]. Consequently, this Open Access BMC Research Notes *Correspondence: patrice.nabbe@univ-brest.fr 1 EA 7479 SPURBO, Department of General Practice, Université de Bretagne Occidentale, Brest, France Full list of author information is available at the end of the article method of selection excluded tools such as PHQ, which are not validated against the DSM [10]. Then it was necessary to select the more reliable, efficient and ergonomic tool.
Based on these criteria, the research question was: which diagnostic tool for depression would GP researchers select as the most efficient, reliable and ergonomic for use in clinical research?

Criteria to compare
The psychometric properties, (sensitivity, specificity, positive and negative predictive values) of the tools were extracted [9]. They did not vary sufficiently to allow statistical comparison, as the study populations were different. Subsequently, a narrative review was undertaken to extract the reliability data (Cronbach's alpha, Cohen's kappa). The ergonomics were also important, but comparing this aspect of tools was complex due to the number of items, test duration, method of inquiry, score range, etc. A consensus, taking into account quantitative and qualitative criteria, based on an European expert panel, was the only alternative to ensure comparison [11].

Consensus procedure
The RAND/UCLA appropriateness method (RAM) is approved by major institutes, such as the NICE (National Institute for health and Clinical Excellence) in the United Kingdom or the HAS (Haute Autorité de Santé) in France. It was the most appropriate consensus method [12,13].
Developed in the mid-1980s, it is an instrument to enable the measurement of the overuse and underuse of medical and surgical procedures. It allows a consensual choice in the comparison of complex processes [11].
RAND/UCLA is a "two-round modified Delphi process" which includes a nominal group. The Delphi rounds avoid leader opinion influence; the panel meeting creates the opportunity to discuss ratings and judgments face to face [14] (Fig 1).
Based on the result of a narrative review completed initially, the quality level of the RAM is increased when the results of a systematic review are used [11,14].
The RAM is one of several methods that was developed to identify the collective opinion of experts [11]. With RAM, repeated assessment is used by all experts to rank relevance, objectivity and homogeneity [13]. The RAM produces appropriateness criteria and quality indicators with face, construct and predictive validity [15].

Experts' panel
The experts' panel was purposively selected from primary care, on research expertise, academic expertise, English level, gender, practice, native culture and language [16].

First step
The study started with a Delphi procedure to eliminate the less efficient and keep the more reliable tools. The comments took into account only validity data, not ergonomics.
Each expert received the study flow-chart; study method; efficiency, sample and reliability data and consent form. They had to rate the efficiency and reliability of each tool on a 9-point Likert scale [17]: • Is this tool efficient for the diagnosis of depression in primary care? • Is this tool reliable for the diagnosis of depression in primary care?
Consensus was defined as at least 70% of the experts rating questions at 7 or above [13]. A tool was considered appropriate if it scored higher than 70% on each question. Comments were collected in order to structure the experts' panel meeting.

Second step
The 2nd step (panel meeting) had to confirm the results of the 1st step and allow debate, without voting, resulting in a presentation of the selected tools. The following resources were provided to experts: methodology reminder, first-round results including all comments, ergonomic features, bibliography data and three 9-point Likert scale notation forms. The forms were completed at the beginning, after testing tools, and at the end of the experts' meeting.
The experts were invited to discuss the results of the first round and whether they agreed with them. If more than 70% of the experts agreed with the results, the first Delphi round was considered successful.
The experts were invited to rate the following statements: • "This tool is easy to use in general practice".
• "This tool could easily be introduced during a consultation". • "This tool could be understood by patients".
• "I like this tool".
• "Patients could be surprised by this tool".
Experts were invited to evaluate before and after testing the tools face-to-face in pairs. This was undertaken to assess whether testing tools had modified their judgment. Then the ergonomics were discussed. The meeting ended with final evaluations. The entire meeting was recorded in both video and audio format for ultimate quality control.
No final consensus was required at the end of the meeting [11].

Third step
The goal was to select one tool. At the end of the experts' meeting, all discussions were transcribed. Each expert received the transcript independently.
The final question was: "Which is the most appropriate tool for the diagnosis of depression in adult patients, in General Practice, in Europe, in terms of Efficiency, Reproducibility and Ergonomics?" The experts were asked to vote on each tool and to comment on their responses.

Results
Eleven experts from 8 European countries participated. They were all GPs, fluent in English. The panel was composed of 9 women and 2 men. Of the 11 experts, 9 practised in urban areas of more than 5000 inhabitants and  Table 1).
The tools selected by the literature review were: GDS-5, 15 and 30 (Geriatric Depression Scale with 5, 15 and 30 items), the HSCL-25 (Hopkins Symptoms Checklist with 25 items), the HADS (Hospital Anxiety Depression Scale), the PSC-51 (physical symptom checklist in 51 items), and the CES-DR (Center for Epidemiologic Studies Depression Scale-Revised).

First step results
The PSC-51, GDS-30 and CES-DR: eliminated for lack of efficiency.
The GDS-15 and GDS-5: eliminated for lack of reliability.
The HADS and the HSCL-25: considered efficient and reliable (Table 2).

Second step results
Eight experts participated and confirmed that HSCL-25 and HADS were the best-validated tools in terms of efficiency and reliability.
Before the ergonomics test, the experts had favoured HADS. Their individual opinions were modified after testing the HSCL-25 face-to-face (Table 3). Consensus was not sought at the end of the meeting.  All comments were collected and were returned to the experts in the document they were sent for the 3rd phase (for example):

Third step results
The 8 experts who participated in the whole procedure were asked to vote: "Which is the most appropriate tool to diagnose depression in adult patients in General Practice, in Europe, in terms of its efficiency, its reliability and its ease of use?" • 6 answered, "In my opinion, the HSCL-25 is the most appropriate tool to diagnose depression in Primary Care practice. " • 2 answered, "In my opinion, the HADS is the most appropriate tool to diagnose depression in Primary Care practice. " The experts gave final comments (for example):

Discussion
The HSCL-25 appeared the most interesting tool for diagnosing depression in terms of the combination of its efficiency, reliability and ergonomics. It is a self-rating scale derived from the SCL-90 which is a multidimensional psychological test instrument for the assessment of psychological symptoms and distress [18][19][20]. It has robust efficiency and reliability scores [21][22][23]. This RAM study was based on a systematic literature review [9], of higher quality than the original RAM with a non-systematic literature review. The ergonomic factor was an important criterion in maintaining a relationship between patients and GPs. Researchers demonstrated by this process how ergonomics were decisive in choosing a tool suitable for future research [24]. HSCL 25 has been widely used for evaluation among traumatised populations and used many times in primary care [25][26][27][28][29]. HADS has been widely used over a long period for clinical and research purposes [30]; has been translated into several languages [31] and validated for use in primary care. Nevertheless, HADS seemed complicated for research purposes in daily practice [32][33][34].
In conclusion, the HSCL-25 best combined efficiency, reliability and ergonomics for diagnosis of depression within European primary care practice from a research perspective. It will allow multi-centred collaborative research throughout Europe. HSCL-25 could allow transversal research between psychiatrists and GPs. The group will be vigilant as a self-administered questionnaire must be easily understood by the general population. Its translation into several European languages allows collaborative research. Application in practice must be demonstrated for each national translation.

Limitations
The quality of the panel was important for the overall quality level. The panel conformed to the requirements of variability in culture, language and practice. 4 language families were represented: Germanic, Slavic, Hellenic and Romance. The panel size was sufficient (7-15 experts) [11].The deadlines for the Delphi rounds were short. Each judgment was performed blind [42]. To reduce information bias, each expert received a record of all the bibliographic sources of the data provided.
The reliability data were mainly based on Cronbach's alpha values. Those values were extracted using an additional literature review [43].
The tools found in literature were not anonymised. The judgment of each expert could possibly take his/ her knowledge into account. Nevertheless, the experts' opportunity for debate during meetings controlled this possible confusion bias.
A systematic literature review creates the possibility of original selection bias. From the outset, the gold standard was the psychiatric examination based on the DSM's major depression criteria. Tools with a high level of validity but which did not use this gold standard as their starting point, such as PHQ [44], could not be selected. The objective of the SRL was to focus on the tools; the list was not exhaustive. It could be worthwhile to initiate a study using another gold standard, such as the Hamilton test [45], and compare results. Authors' contributions NP made substantial contributions to conception and design, acquisition of data, analysis and interpretation of data. He has been involved in drafting the manuscript and also agreed to be accountable for all aspects of the work by ensuring that questions related to the accuracy and integrity of any part of the work were appropriately investigated and resolved. LRJY made substantial contributions to conception and design, acquisition of data, analysis and interpretation of data. He has been involved in drafting the manuscript and revising it critically for important intellectual content. GLM made substantial contributions to conception and design and has been involved in revising it critically for important intellectual content. LD, SSS, HM, LH, CA, FSMMI, SA, SA, LC, CS and DC made substantial contributions to acquisition, analysis and interpretation of data and have been involved in revising it critically for important intellectual content. LFB made substantial contributions to conception and design and has been involved in drafting the manuscript. MT has been involved in revising it critically for important intellectual content and has given final approval for the version to be published. VMH and VRP made substantial contributions to conception and design, have been involved in revising it critically for important intellectual content and have given final approval for the version to be published. All authors read and approved the final manuscript.