Skip to main content


Evaluation of clinical practice guidelines using the AGREE instrument: comparison between data obtained from AGREE I and AGREE II



The Appraisal of Guidelines for Research and Evaluation (AGREE) is a representative, quantitative evaluation tool for evidence-based clinical practice guidelines (CPGs). Recently, AGREE was revised (AGREE II). The continuity of evaluation data obtained from the original version (AGREE I) has not yet been demonstrated. The present study investigated the relationship between data obtained from AGREE I and AGREE II to evaluate the continuity between the two measurement tools.


An evaluation team consisting of three trained librarians evaluated 68 CPGs issued in 2011–2012 in Japan using AGREE I and AGREE II. The correlation coefficients for the six domains were: (1) scope and purpose 0.758; (2) stakeholder involvement 0.708; (3) rigor of development 0.982; (4) clarity of presentation 0.702; (5) applicability 0.919; and (6) editorial independence 0.971. The item “Overall Guideline Assessment” was newly introduced in AGREE II. This global item had a correlation coefficient of 0.628 using the six AGREE I domains, and 0.685 using the 23 items. Our results suggest that data obtained from AGREE I can be transferred to AGREE II, and the “Overall Guideline Assessment” data can be determined with high reliability using a standardized score of the 23 items.


Clinical practice guidelines (CPGs) are “statements that include recommendations intended to optimize patient care that are informed by a systematic review of evidence and an assessment of the benefits and harms of alternative care options.” [1]. CPGs are a representative tool for standardizing medical interventions and improve healthcare quality. In Japan, CPG development, using evidence-based medicine (EBM), began in the late 1990s with government support. Currently, 30–40 CPGs are developed per year, mainly by academic societies.

With the spread of CPGs in Japan, infrastructure to promote their use is also being developed. This includes clearing houses and standard manuals for developing CPGs. The Toho University Medical Media Center and the Medical Information Network Distribution Service Guideline Center of the Japan Council for Quality Health Care both operate CPG clearing houses [2, 3].

The Appraisal of Guidelines for Research and Evaluation (AGREE) instrument, developed by the AGREE Enterprise, is a quantitative method for evaluating CPGs. The AGREE instrument determines items that must be satisfied by CPGs, and is expected to facilitate cost effective CPG development and improve CPG quality [4]. In 2010, the original version (AGREE I) was revised and published as AGREE II [5,6,7]. Several studies evaluated CPGs using the AGREE I or AGREE II [8,9,10]. However, the continuity of the data obtained from AGREE I and AGREE II has not yet been demonstrated. The AGREE I was widely used and there is large amount of associated data; investigation of the continuity and conversion of data between AGREE I and II is necessary to make full use of AGREE I data.

We investigated the continuity of AGREE I and AGREE II data, and the conversion method from AGREE I data to AGREE II data.

Main text


A team consisting of three experienced librarians evaluated 68 CPGs, based on EBM issued in 2011–2012 using the AGREE I [11] and AGREE II [12]. The evaluated CPGs were all issued in 2011–2012 in Japan. Their contents were checked and judged by expert librarians as to whether they were prepared using EBM methodology, or not. The librarians who evaluated the CPGs have knowledge about the CPG preparation and experience using the AGREE tool. The librarians conducted independent evaluations and did not adjust the result; the results were aggregated into standardized scores. Correlation coefficients were calculated for the domains and items of the two instruments.

AGREE I comprised one overall assessment item and six domains: (1) scope and purpose; (2) stakeholder involvement; (3) rigor of development; (4) clarity of presentation; (5) applicability; (6) editorial independence, totaling 23 items. Each item is rated on a 4-point Likert scale (1 = “Strongly Disagree” to 4 = “Strongly Agree”). A standardized score for each domain was calculated according the formula shown below:

$$\left[ {{{\left( {{\text{obtained score}} - {\text{minimum possible score}}} \right)} \mathord{\left/ {\vphantom {{\left( {{\text{obtained score}} - {\text{minimum possible score}}} \right)} {\left( {{\text{maximum possible score}} - {\text{minimum possible score}}} \right)}}} \right. \kern-0pt} {\left( {{\text{maximum possible score}} - {\text{minimum possible score}}} \right)}}} \right] \times 100\% .$$

For example, the scope and purpose domain consists of three items; the sum of the maximum possible score is 3 × 3 × 3 = 27, and the sum of the minimum possible score is 1 × 3 × 3 = 9 [11].

AGREE II is based on AGREE I, incorporating four distinct changes. First, the rating scale was changed from a 4-point to a 7-point Likert scale (1 = “Strongly Disagree” to 7 = “Strongly Agree”). Second, an item was added as a second overall guideline assessment item: “Rate the overall quality of this guideline”. Third, the wording or expression of several items was changed, although the meaning of the items was preserved. Finally, Q7 (AGREE I) “The guideline has been piloted among end users” was removed, and was incorporated in Q19 (AGREE II) “The guideline describes facilitators and barriers to its application” and a new item Q9 (AGREE II) “The strengths and limitations of the body of evidence are clearly described”. Therefore, Q7 (AGREE I) and Q9 (AGREE II) were excluded from analysis in the present study. A comparison of AGREE I and AGREE II items is shown in Table 1.

Table 1 Comparison between the AGREE I and AGREE II

As there was no item in AGREE I that corresponded with the new overall guideline assessment item in AGREE II, we attempted to calculate this value using two approaches. First, we calculated the average of the standardized score using results of the six AGREE I domains. Second, we calculated the standardized score using the results of the 23 AGREE I items. We examined the correlation between “Overall Guideline Assessment” in the AGREE II and the results of the two approaches described above.

We used t-tests to compare standardized scores, and calculated correlation coefficients for each AGREE I and AGREE II item and domain. p values < 0.05 were indicated statistical significance. All analyses were performed using SPSS, version 20.0 (IBM SPSS Statistics for Windows, Version 20.0. Armonk, NY: IBM Corp.).


The results of the AGREE I and AGREE II evaluations are shown in Fig. 1. Correlation coefficients are shown in Table 2. High correlations were observed in all domains: scope and purpose = 0.758; stakeholder involvement = 0.756; rigor of development = 0.992; clarity of presentation = 0.865; applicability = 0.938; and editorial independence = 0.938. The correlation coefficients of each item ranged from 0.708 to 0.982.

Fig. 1

Evaluation of clinical practice guidelines, published between 2011 and 2012, using the AGREE I and AGREE II (n = 68). t test; *p < 0.05, **p < 0.01

Table 2 Correlation between the AGREE I and AGREE II domains

Correlation coefficients for the 22 items ranged from 0.694 to 0.995; 16 items had a correlation coefficient of 0.9 or more, three items were 0.8–0.9, and three items were 0.6–0.8. A high overall correlation was observed for all items (Additional file 1: Table S1).

The newly-introduced overall assessment item “Overall Guideline Assessment” (AGREE II) should be assessed based on AGREE I data. The six AGREE I domains had a correlation coefficient of 0.628, when 23 items were used it was 0.685, suggesting a higher related value could be gained using the latter (Table 2).


Since its publication in 2003, the high popularity of the AGREE instrument has produced a large amount of evaluation data. With the revision of the AGREE instrument, the relationship between data obtained from AGREE I and AGREE II, and data conversion from the AGREE I to the AGREE II are high research agenda priorities for investigating time trend analyses of CPG quality.

For the 68 CPGs issued in 2011–2012, our results demonstrated that AGREE I and AGREE II were highly correlated at both the domain and item levels, and the newly introduced overall rating item “Overall Guideline Assessment” could be calculated more precisely using the 23 AGREE I items, rather than domain-level data.

Increasing attention is being directed to safety and quality issues, and CPGs based on EBM are a representative method for standardizing and improving the quality and safety of healthcare procedures. The AGREE instrument is widely used to measure CPG quality. Our results suggest that the AGREE instrument can still be used as a measurement tool, which exhibits high consistency, although it has now been revised (AGREE II). It enables long-term, comprehensive CPG evaluation. The Japanese government has promoted CPG preparation since 1996. Our study may help evaluate the underlying policy guidelines.


Data obtained from AGREE I can be transferred to the AGREE II, and the data for “Overall Guideline Assessment” can be calculated with high reliability using a standardized score of the 23 items.


Our evaluation team did not include any researchers or clinicians. However, the expert librarians had extensive knowledge about CPG preparation and had experience evaluating CPGs using the AGREE measure.



Appraisal of Guidelines for Research and Evaluation


clinical practice guidelines


evidence-based medicine


  1. 1.

    Institute of Medicine (IOM). Clinical practice guidelines we can trust. Accessed 15 July 2017.

  2. 2.

    Toho University Medical Media center. Toho University and Japan medical abstracts society clinical practice guideline information database. Accessed 15 July 2017.

  3. 3.

    Japan Council for Quality Health Care. Medical information network distribution service (Minds). Accessed 15 July 2017.

  4. 4.

    The AGREE Collaboration. Development and validation of an international appraisal instrument for assessing the quality of clinical practice guidelines: the AGREE project. Qual Saf Health Care. 2003;12:18–23.

  5. 5.

    Brouwers MC, Kho ME, Browman GP, et al. Development of the AGREE II, part 1: performance, usefulness and areas for improvement. CMAJ. 2010;182(10):1045–52.

  6. 6.

    Brouwers MC, Kho ME, Browman GP, et al. Development of the AGREE II, part 2: assessment of validity of items and tools to support application. CMAJ. 2010;182(10):E472–8.

  7. 7.

    Brouwers MC, Kho ME, Browman GP. AGREE II: advancing guideline development, reporting, and evaluation in health care. Prev Med. 2010;51(5):421–4.

  8. 8.

    Burgers JS, Fervers B, Haugh M, et al. International assessment of the quality of clinical practice guidelines in oncology using the appraisal of guidelines and research and evaluation instrument. J Clin Oncol. 2004;22(10):2000–7.

  9. 9.

    Henig O, Yahav D, Leibovici L, et al. Guidelines for the treatment of pneumonia and urinary tract infections: evaluation of methodological quality using the appraisal of guidelines, research and evaluation II instrument. Clin Microbiol Infect. 2013;19(12):1106–14.

  10. 10.

    Smith CAM, Toupin-April K, Jutai JW, et al. A systematic critical appraisal of clinical practice guidelines in juvenile idiopathic arthritis using the appraisal of guidelines for research and evaluation II (AGREE II) instrument. PLoS ONE. 2015;10(9):e0137180.

  11. 11.

    Toho University Medical Media center. AGREE I Japanese version. Accessed 15 July 2017.

  12. 12.

    Japan Council for Quality Health Care. AGREE II Japanese version. Accessed 15 July 2017.

Download references

Authors’ contributions

KS participated in the design of this study, performed the data collection and analysis and drafted the manuscript. KM, SF, TK and SH participated in the design of this study and performed the data analysis. TH conceived the study, participated in its design, and helped to draft the manuscript. All authors read and approved the final manuscript.


The authors thank the evaluation team for their sincere efforts within the context of this study. The authors would like to thank Enago ( for the English language review.

Competing interests

The authors declare that they have no competing interests.

Availability of data and materials

Please contact the corresponding author for data requests.

Consent for publication

Not applicable.

Ethics approval and consent to participate

Our research data did not involve human subjects, human material, or human data.


Health and Labor Sciences Research Grants in Japan.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Author information

Correspondence to Tomonori Hasegawa.

Additional file


Additional file 1: Table S1. Correlation between the AGREE I and AGREE II items. *p < 0.05, **p < 0.01.

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark


  • Clinical practice guidelines
  • AGREE (Appraisal of Guidelines for Research and Evaluation) instrument
  • Data transfer
  • Data mapping