Valid comparisons and decisions based on clinical registers and population based cohort studies: assessing the accuracy, completeness and epidemiological relevance of a breast cancer query database

Background Data accuracy and completeness are crucial for ensuring both the correctness and epidemiological relevance of a given data set. In this study we evaluated a clinical register in the administrative district of Marburg-Biedenkopf, Germany, for these criteria. Methods The register contained data gathered from a comprehensive integrated breast-cancer network from three hospitals that treated all included incident cases of malignant breast cancer in two distinct time periods from 1996–97 (N=389) and 2003–04 (N=488). To assess the accuracy of this data, we compared distributions of risk, prognostic, and predictive factors with distributions from established secondary databases to detect any deviations from these “true” population parameters. To evaluate data completeness, we calculated epidemiological standard measures as well as incidence-mortality-ratios (IMRs). Results In total, 12% (13 of 109) of the variables exhibited inaccuracies: 9% (5 out of 56) in 1996–97 and 15% (8 out of 53) in 2003–04. In contrast to raw, unstandardized incidence rates, (in-) directly age-standardized incidence rates showed no systematic deviations. Our final completeness estimates were IMR=36% (1996–97) and IMR=43% (2003–04). Conclusion Overall, the register contained accurate, complete, and correct data. Regional differences accounted for detected inaccuracies. Demographic shifts occurred. Age-standardized measures indicate an acceptable degree of completeness. The IMR method of measuring completeness was inappropriate for incidence-based data registers. For the rising number of population-based health-care networks, further methodological advancements are necessary. Correct and epidemiologically relevant data are crucial for clinical and health-policy decision-making.


Background
Today 0 s information technologies make it possible to collect and process comprehensive population-based public-health data. Cohort studies and data register collect, check, and analyze medical data in regular time intervals. This procedure is intended to provide highquality data on which crucial health-policy decisions can be based. The goal of this so-called "public-disclosure strategy" is to increase health care system transparency of and for all stakeholders related to public health [1,2]. Increased transparency is important for various stakeholders (patients, physicians, hospital management, health insurers, clinical-trial coordinators, policymakers, etc.) in different settings (primary care, in-home care, biodatabases, etc.) and on various levels (micro, meso and macro) with regard to aspects of health care quality (structures, processes, outcomes) [3,4]. In all of the cases above, it is the quality of the data, as well as its verification and assessment that determine the quality of the subsequent information and decisions. Thus, decisions can never be better than the information retrieved from the data they are based on.

The common conception of data quality
But what does "data quality" really mean? The scientific community has established standards such as objectivity, reliability, and validity of measurement instruments to ensure reproducible and comparable research results [5][6][7]. There have been, for example, various exploratory data-analysis methodologies in common use, each offering an independent method for achieving these standards [8,9]. In an effort to standardize these various methods, cancer registers have published extensive manuals of these numerous methods [10] to improve the validity of the comparable epidemiological indices used for health-monitoring reports. However, these manuals do not make theoretical distinctions between levels of data quality and they do not offer an integrative framework for explorative methods. To overcome these shortcomings the "GUIDELINES FOR THE ADAPTIVE MANAGEMENT OF DATA QUALITY FOR COHORT STUDIES AND REGISTERS" (GAMOQ) [6,7,11] were developed.
The level-based conception of data quality These GAMOQ were developed in 2006 by the technology and methodology platform for networked medical research (TMF) to facilitate the independent assessment of data quality as well as its subsequent improvement. An extensive literature review and expert interviews were part of the development process [7]. The GAMOQ differentiated the term "data quality" into three distinct levelsdata plausibility, data organization, and data correctnesscorresponding to the already-existing assessment approaches for quality of medical care (structure, process, and outcome quality) [4,12]. Each of these data levels encompasses specific data-quality indicators (DQIs). In total there are 24 DQIs which assess different aspects of a data set or register. A general proof of feasibility, usefulness, and practicability of this methodological framework has already been reported elsewhere [13,14].

Data accuracy and completeness as objectives
Here we aim to present the most important results concerning data-level correctness of a population-based clinical breast-cancer registry from the longitudinal point of view. Of particular interest are the DQIs accuracy and completeness, both of which allow classifying a data set0s epidemiological relevance. In the following, accuracy refers to the degree to which the primary data differ from the population0s "true" parameters determined by using established secondary data sources [7].
Completeness refers to the degree to which the primary data have captured all relevant patients in accordance with the inclusion criteria [7]. The leading hypothesis assumes that the current population-based breast-cancer register is accurate and complete and that it contains epidemiologically relevant data.

Primary database
The primary data to be assessed was collected in three hospitals in the administrative district of Marburg-Biedenkopf, Germany. They included all females that were treated in these hospitals for the first time for malignant breast cancer (ICD-10: C.50). Recruiting was carried out within the context of two prospective cohort studies that took place in the periods from 1996-97 and 2003-04. The population surveys were conducted independently of one another, and thus the data was not processed according to a uniform standard. It was saved in different file formats (Excel, SPSS). In total, 1,389 (1996-97) and 150 variables (2003-04) related to demographic, socioeconomic, and medical issues were collected. The variable names and value codes of the two data sets0 variables were synchronized. The data sets were transferred to a clinical register called the Breast Cancer Query Database (BCQDB). The accuracy and completeness of the BCQDB was assessed. The study was approved and conducted according to the guidelines of the local ethics committee of the Philipps University Marburg (Germany).

Secondary databases
Additional secondary data sources with high data-quality standards were used to estimate the epidemiological accuracy and completeness of the BCQDB. The "National Field Study for the Assessment of the Quality of Breast Cancer Care" [15] provided distributions and parameter estimates for risk, prognostic, and predictive factors for breast cancer, as did the common cancer registers of the German federal states of Berlin, Brandenburg, Mecklenburg-Vorpommern, Saxony-Anhalt, Saxony, and Thuringia [16,17], as well as the Munich Tumor Register [18]. For comparison purposes, we also integrated raw and age standardized incidence rates from the epidemiological cancer register of Saarland, Germany, and the German Center for Cancer Registry Data located at the Robert Koch Institute (RKI). To estimate mortality-incidence ratios, we integrated breast-cancer-specific mortality rates kept by the regional statistic authorities of the German federal state of Hesse.

Measures of data accuracy
The central idea of the GAMOQ is to assess data quality, if necessary by source data verification techniques, by comparing the dataset for a cohort study or a data register (in this case, the BCQDB) with the original data files (in this case, the Excel and SPSS files). However, the latter can also be substituted by reliable external data if it is the data0s accuracy and completeness that are being assessed. Therefore, we measured the accuracy of the BCQDB register by comparing all available risk, prognostic, and predictive factors to distributions available from external databases included in this study (see above). These factors included patients0 age at breastcancer diagnosis, tumor location, tumor size (pT), node status (pN), stage of the disease according to the Union for International Cancer Control (UICC), grading, and hormone-receptor status [19][20][21].

Measures of data completeness
To critically appraise the BCQDB register0s completeness, established epidemiological measures were used [5,[22][23][24]. We calculated raw age-specific incidence rates and cumulative incidence curves in order to identify any possible systematic deviations from external databases. We also calculated direct age-standardized incidence rates for the standard populations of both Europe and the world in order to account for any possible age-related or demographic differences. In addition, we estimated indirect agestandardized incidence rates, taking the population of the German federal state of Saarland as the standard population. This allowed the direct comparison between Saarland and the administrative district of Marburg-Biedenkopf without using neutral standard populations (e.g., Europe, the world). This approach allowed us to account for the demographic particularities of the Marburg-Biedenkopf administrative district, which is the catchment area of the BCQDB register. Finally, incidence-mortality ratios (IMR) were calculated for each year to obtain further insights into the BCQDB0s completeness [16,25,26].

Statistics
Rates and ratios were calculated for the BCQDB and then compared to the corresponding figures for the various external databases. This statistical comparison was performed using an exact binomial test with the specified "true" parameters of the selected secondary databases. The probability of error of alpha was 0.05 [8,27]. The corresponding 95% confidence intervals were derived using Pearson-Clopper values which were based on F-distributions and which are appropriate for small samples [7] characteristically surveyed in rural areas (small-area analysis) [28]. Incidencebased parameters and distributions of risk, prognostic, and predictive factors were compared using overlapping 95% confidence intervals based on Gaussian normal distributions [23,24].

Results
The survey recruited a total of N=877 patients (1996-97: N=389, 2003-04: N=488) who were primarily treated in the German administrative district of Marburg-Biedenkopf. Of these incident patients N=577 (1996-97: N=266, 2003-04: N=311) or 67% were registered as residents of that district and belonged to the epidemiologically relevant sample. The comparison of epidemiological parameters and distributions with external databases was only valid for residents. The BCQDB0s accuracy for the 1996-97 cohort is given in Table 1.
Comparison with the external data sources revealed some isolated differences. These differences were randomly distributed over the included parameters and did not show any systematic patterns. Additional differences were also detected for the 2003-04 cohort (see Table 2). Table 2 indicates that the BCQDB register included more women under the age of 49 years as well as fewer patients in the 50-69 year range than the external data sources. Our stratified analyses of women0s age at breast-cancer diagnosis depending on the stage of the disease suggest that older women were underrepresented in the BCQDB data. We checked this assumption using the raw, age-specific, and cumulative incidence curves. Figures 1, 2, 3, 4 show these results.
In the period from 1996-97, the age-specific incidence rates ( Figure 1) already began to show deviations at age 50 years. These tended to scatter arbitrarily around the benchmark parameters. In the period from 2003-04, similar deviations occurred from the age of 65 years ( Figure 2). Both BCQDB cohorts exhibited lower incidence rates than in the epidemiological cancer register of Saarland that begins approximately from the age of 75 years. The cumulative incidence rates (CIR), which showed greater deviations for the 1996-97 cohort than for the 2003-04 cohort (Figures 3, 4) conveyed a similar impression. The "kinks" at age 55 years in the 1996-97 cohort and at age 65 years in the 2003-04 cohort were especially revealing. The cumulative incidence rates were CIR (BCQBD, 1996-97) = 10.0 and CIR (BCQBD, 2003-04) = 11.6 in comparison to CIR (SAAR, 1996-97) = 11.9 and CIR (SAAR, 2003-04)= 12.5. However these differences were not apparent in the directly agestandardized incidence rates (SIR*, SIR**) or the indirectly standardized incidence ratios (SIR***). Table 3 shows these results.
As expected, we found no systematic differences in data completeness when using incidence rates that were age-standardized. In contrast, our estimations using incidence-mortality ratios (IMR) estimated completeness to be low: IMR (1996-97) = 36.8 and IMR (2003-04) = 43.8. Detailed results on these incidence-mortality ratios are given in Table 4.

Discussion
A wide and solid base of evidence is needed for rational health policy decisions. New information technologies can help by collecting all available data and making information relevant to health-policy decisions accessible. Technology may help increase the level of transparency in these decisions. The breadth of evidence considered is especially relevant for age-related diseases such as cancer because incidence rates continue to rise in aging societies [29][30][31]. This growing trend is also apparent in the regional sample of breast-cancer patients from the clinical register (BCQDB) evaluated in this study. However, this rather crude trend is not sufficient to justify calling the BCQDB an epidemiologically relevant database. Therefore, its accuracy and completeness was assessed.

Data accuracy
Comparison with external data sources of proven data quality revealed that the BCQDB did not deviate substantially from these proven sources in terms of accuracy. In the period from 1996-97, 5 of 56 indicators The increase in deviations between the two time periods may be due to a selection bias. Therefore, the verification of the register0s completeness was crucial.

Data completeness
The raw age-specific incidence curves we calculated showed some fluctuations typical for surveys in rural areas with small numbers of patients [28]. This is especially true for the higher age groups, which include fewer patients as in this study. A more detailed inspection of the cumulative-incidence curves (lifetime-prevalence) also showed possible under-representations for patients older than 75 years (for 1996-97) and 65 years (for 2003-04). This could be interpreted as a sign of selection bias. However, some other details also make this interpretation doubtful. First, the cumulative-incidence curve for the 1996-97 cohort had a "kink" at age 55 years and then ran almost straight and parallel to the comparator up to the higher age groups. This same "kink" was also seen in the 2003-04 cohort, although there it occurred at age 65 years instead of 55 years. This phenomenon points to demographic shifts which are particularly associated with age structure. Therefore, it was necessary to standardize the raw incidence rates to control for age effects.

Unstandardized and standardized measures
The direct age-standardized incidence rates offset the BCQDB0s deviations from the external data sources that were observed in the raw non-standardized indicators. Particularly for the period from 2003-04, the agestandardized incidence rates exhibited a strong convergence to the reference parameters. This result was confirmed by indirect age-standardization methods, which related raw age-specific incidence rates from the BCQDB to the corresponding demographic population parameters for Saarland. Therefore, it seems reasonable to conclude that the regions0 differing age distributions contributed substantially to the deviations observed in the non-standardized rates. For this reason, estimates of incidence-mortality ratios based on raw age-specific incidence rates and official mortality figures cannot serve as reliable indicators for the assessment of the BCQDB register0s completeness.

Epidemiological relevance
Based on the assessments of all calculated parameters mentioned above, we conclude that the BCQDB register contains sufficiently accurate, complete, and therefore correct data. It seems safe to assume that the BCQDB is of sufficient epidemiological relevance to allow for further comparisons (e.g. adherence to clinical guidelines, analyses of survival, quality of life, and costeffectiveness). It can be used for further comparisons and decisions.

Regional risk adjustments
These conclusions do not contradict the fact that the variables of the data-quality indicator accuracy became more divergent over time (increasing to 8 divergent indicators out of 58 for the 2003-04 cohort). The systematic deviations of certain surrogate parameters (e.g. fewer nodal negatives, more small tumors under 2cm) indicate that these increasing divergences may have been caused by the health-care services themselves becoming more effective in the intervening time interval. This interpretation is supported by the increasing use of high quality assured early-detection programs reported elsewhere [32]. To actually measure the regional effects caused by these early-detection programs, high utilization rates of these programs among the target populations are crucial. However, since the civil-register-based invitation systems, and thus their resulting utilization rates, vary  greatly between the different German federal states [33], it is methodologically impossible to make a fair and valid comparison of the effectiveness of different regions0 health-care systems and corresponding registers. It is, however, possible to compare different regions0 risk profiles, but any further conclusions regarding a region0s effectiveness would be biased [34]. Overcoming this limitation would require additional datacollected at the individual level and of sufficient quality. This would allow valid comparisons, conclusions, and decisions to be made.

Harmonization of nomenclature and statistical definitions
Most countries, including Germany, have already taken the first step toward comprehensive (cancer) registers using and offering more epidemiological data by widely implementing information technology in the health-care field [35][36][37]. The next step is to assess and increase the quality of data in a way that is guided by a theoretical framework such as the GAMOQ. This approach would help minimize doubts concerning the accuracy and completeness of databases. The regional differences displayed by the various external data sources used for comparison in this study highlight how much room there is for interpretation. In particular, harmonizing nomenclatures, statistical distinctions, and reporting certain key parameters of registers or study characteristics could help initiate this process [38], because problems stemming from a data set0s definitional subtleties are difficult to detect. To this end, a number of crucial indicators for survival analysis reports (e.g. year of incidence, years of follow-up etc.) have been proposed in addition to the  known pitfalls of cross-country survival analyses [34]. Overall, all these efforts attempt to facilitate valid comparisons and rational decisions.

Methodological strengths
The strength of this study was the approach that allowed us to integrate the many different available data sources and use them to estimate the epidemiological relevance of the BCQDB. Doing so is especially advantageous for small-scale population-based databases or for clinical registers of rural areas. The reason we chose breast cancer as the exemplary disease with which to demonstrate this procedure was because there are many publicly available sources of breast-cancer data. However, the procedure used in this study which is only one part of the GAMOQs is also applicable to any other disease. It goes beyond the usual comparison of epidemiological measures and gives a deeper understanding of the relationship between a data set0s accuracy and completeness. At this point it is worth emphasizing the analytic value of non-standardized measures, which in this case allowed regional differences and other irregularities to  be identified [22] before beginning the statistical analysis. Such qualitative insights are of particular interest before data has been analyzed and interpreted as they can inform the analysis. In the case of the BCQDB, the assessments of accuracy and completeness show that future analyses should carefully conclude and interpret data results particularly if patients are older than 65 or 75 years (depending on the cohort in question).

Methodological shortcomings
A general weakness of clinical registers working with complete surveys that are administered only to utilizers of regional breast cancer networks is the fact that selection bias can occur [9]. Therefore, completeness-estimation techniques such as the incidence-mortality ratios (IMR) were developed to quantify to what extent the clinical register did not capture missing cases. The IMR approach belongs to the historical methods and a common threshold of 90% or more has been established to consider a register as complete [16,26]. However, in the context of a small-area analysis using time-interval-bounded incidence data as in this study, it is not possible to use incidence mortality ratios (IMR) in the usual way to estimate the size of target population not included in the data. This is because the already ill prevalent patients who died in the observed time intervals (1996-97, 2003-04) were excluded from the BCQDB-register by definition, and were thus not captured in the mortality rates. These cases are however counted in the official mortality statistics taken from the official statistical authorities, and thus figure in to the "expected cases" factor in the denominator of the IMRs. This leads to overestimated IMRs and thus to a higher percentage of patients dying in a given time interval. Therefore, our data-incompleteness estimateswhich estimated the number of cases missing from the data at 64% and 57%were unrealistically high. Finally, the IMR approach also entails all of the usual drawbacks associated with disease-specific cause-of-death statistics and postmortem examinations [39,40].

Future potential of benchmarks
Further methodological developments for estimating the correctness (e.g. accuracy, completeness) of data sets and cohort studies will be necessary in the future, both in order to allow researchers to gain access to the information that remains hidden in the growing amount of data and to allow this data to be exploited for epidemiologically and healtheconomically valid comparisons, conclusions, and decisions. There is demand for such new methodologies due to infrastructural changes (such as integrated health-care networks, organ centers, and clinic chains) which are shifting the focus to multidisciplinary oriented health care for chronic diseases. Furthermore, these new organizational developments are population-based and cover countries, states, districts, and counties. This means that demographic distributions are available from statistical authorities, which facilitates the estimation of incidence rates. Therefore, the completeness estimation for a clinical register in question is restricted to patients with permanent residence of the corresponding area and is not valid for non-resident patients with the same disease. Under these circumstances, healthcare providers can be compared to external benchmarks. This approach is superior to anonymous benchmarks based on averages, because it allows much more than the usual comparison of a few abstract quality indicators [41,42]. Specifically, it allows the best performers on a given quality indicator to be used as the benchmark, enabling conclusions to be drawn regarding how the benchmark organization achieved its results as well as how similar results could be achieved at the lower-performing organization. However to attain such external quality assurance mechanisms, it is essential to identify comparable organizations or institutions which can be used as peer groups.

Conclusion
In this study, we proposed accuracy statistics that rely on patients0 basic characteristics and common risk, prognosis, and predictive factors. However, the list of indicators could be more extensive to 1) describe the general conditions of health care institutions and their patients which were exposed to new performanceenhancing quality tools (such as quality-management systems) and 2) to compare and decide whether new structural or process-related changes lead to expected improvements. This approach helps to reproduce the results gained in some (specialized) health care institutions. It fosters the understanding why some innovations have worked and others have not and how they are applicable to non-specialized institutions [43][44][45]. Therefore, only the comparison between "equal" institutions will allow for valid comparisons, conclusions, and sound decisions that will lead to further improvements in the quality and transparency of public health care.