Compensation Research Database: population-based injury data for surveillance, linkage and mining

Background Compensation health research aims to study the influence of compensation systems, processes and practices on health and health-related outcomes. In many jurisdictions, injury compensation authorities collect substantial volumes of case and service level data for the purpose of administering the compensation system. An important secondary use of such data is research and analysis to explore interactions between individuals and organisations in compensation systems, and between compensation and other systems including healthcare and legal systems, in order to understand the role of compensation processes in injury recovery. Results The Compensation Research Database (CRD) established at the Institute for Safety Compensation and Recovery Research at Monash University, holds over 20 years of population-based data for transport and workplace injury in the state of Victoria, Australia. The CRD is unique in that it is held independently, at arm’s length from the compensation authorities that collect the data, and its primary purpose is to support research and analyses to develop new insights into system and individual level outcomes. This paper describes the core elements of the database including the design, process and type of information collected. We review some of the research findings that have been published using the CRD, and describe the ongoing program of research utilising the database. Conclusions The CRD is a unique administrative database that supports research into compensation health, with the objective of improving understanding of the interaction between injury compensation systems and injury recovery. The availability of the CRD for independent research is leading to substantial advancements in the compensation health research field and in related areas.


Background
In the mid 1980s the state of Victoria, Australia established two population based, no-fault injury compensation systems. These systems provide payments for healthcare, income replacement and lifetime care costs for Victorian's injured in transport accidents via the Transport Accident Commission (TAC) or at work via Worksafe Victoria (WSV). As of June 2015, Victoria has a population of approximately 5.9 million residents, and annually the two compensation systems accept approximately 50,000 new claims for compensation. In 2013-14, the TAC and WSV paid out over $1.1 billion and $2 billion, respectively in benefits and compensation to injured Victorians [1,2].
In 2014, there were 249 deaths from transport accidents, 20 deaths from work accidents in the state, many thousands of serious injuries, and the burden of disease from these injuries is substantial [3][4][5][6]. Compensation systems such as those operated by the TAC and WSV play an important role in the Victorian community. As the state's regulators and insurers of transport and workers' compensation systems, they are responsible for compensation, funding of effective rehabilitation and accident prevention [7,8]. They have a direct engagement with the injured person, their healthcare providers and the injured person's employer. Furthermore, the systems in place in Victoria are similar to compensation systems throughout many other countries. The Victorian systems of injury compensation bear close resemblance to other systems in Australia, New Zealand, Canada, the United States, and Hong Kong and share some objectives (e.g., return to work after work injury) and design elements with social support systems in Europe and some South American nations [9][10][11][12].
The policies and practices of compensation systems can have a substantial impact on the health and healthrelated outcomes of those injured. There is growing evidence that those who receive compensation for injury or disease have poorer health and vocational outcomes and slower recovery than those with matched injuries who do not receive compensation [13,14]. Furthermore, the magnitude of the disability is substantial, with a number of recent meta-analyses reporting moderate effect sizes for poor outcomes among those receiving compensation for their injuries than those with matched non-compensable injuries [15][16][17][18]. Conversely, a study conducted by McAllister et al. [19] in New Zealand showed that in a universal no-fault injury compensation system, those with injury who received compensation have better economic and return to work outcomes when compared with a comparative group with non-compensable impairments due to disease. Given the mixed evidence, there is a critical need for greater understanding on how the compensation system itself and individual aspects of the compensation system impact recovery following compensable injury.
One approach to attain greater understanding of the burden and outcomes of compensable injury and disease is to use compensation system data. All compensation authorities collect data for the purposes of administering, monitoring and evaluating the compensation system. In some jurisdictions, compensation data can provide detailed information on the frequency and costs of healthcare and other medical and allied health services, income replacement, and interactions with legal and other administrative systems. These data can be used to assess the impact of policies and practice changes on compensation system relevant outcomes such as cost and length of disability. These data can also be linked to healthcare datasets which enable examination of predictors of compensation system relevant outcomes. Analysis of such data will lead to a greater understanding of the role of compensation processes in injury recovery and potentially lead to improved practices and policy changes within a single jurisdiction's healthcare and employment systems.
In 2009 and 2012, workshops led by the National Institute for Occupational Safety and Health in the United States advocated the use of workers' compensation data to track incidence and costs, to identify priorities and gaps in workplace hazards, and to evaluate injury and illness prevention program effectiveness [20,21]. In addition, leading organisations in the compensation health sector such as the Institute of Work and Health [22], the Partnership for Work Health and Safety in British Columbia Canada [23] and the Department of Environmental and Occupational Health Sciences at Washington University USA [24] are conducting high quality research by using workers' compensation data to address current and emerging issues of work-related health. Overall, compensation data is a unique resource that can enable examination of population-based personal injury claims and payment records, arising from transport, workplace accidents or other compensable conditions. The aim of this paper is to describe the Compensation Research Database (CRD) and summarise the design and processes including the type of information collected, review key research findings that have been published using the CRD, and discuss research opportunities that could be examined with the CRD.

Compensation systems in Victoria, Australia
The state of Victoria in Australia provides no-fault compensation for both transport accidents and workplace injuries and illnesses. Those injured in land-based transport accidents involving a car, motorcycle, tram, bus or train are eligible to claim compensation for treatment, income replacement, rehabilitation and long-term support services via the TAC, regardless of fault. Individuals with mental health condition that arise subsequent to the transport accident injury are also eligible to claim compensation for mental health services. In addition, the TAC provides compensation for injury and death occurring interstate for individuals travelling in a Victorian-registered motor vehicle in other Australian states. Injuries and death occurring on the road but not involving a motorised vehicle (e.g. a collision between a pedal cyclist and a pedestrian) are not eligible for compensation. Compensation benefits cover the reasonable costs of the treatment for transport-related injuries. A medical excess is applicable ($623 for accidents between 1st July 2014 and 30th June 2015 and indexed annually according to the average weekly earnings). There are maximum fees for most services. The TAC provides funding for the following healthcare services: ambulance services (e.g. for transport from the injury location to hospital and, where required, from one hospital to another), hospital services (e.g. treatment at a public, private or rehabilitation hospital), medical services (e.g. visits to family doctor and specialist doctor), pharmacy items (e.g. for medicine prescribed by doctor and provided by a pharmacist), therapy services (e.g. physiotherapy, chiropractic, podiatry, optometry, osteopathy, and psychology) or nursing services (e.g. home visits after discharge from hospital). In addition, the TAC provides funding for income replacement and the long-term care needs of severely injured clients, including equipment for activities of daily living, modifications to housing and attendant care. Income replacement are paid in the first 18 months after the transport accident. The amount of income replacement is calculated as the weekly average of the gross earnings during the 12 months immediately before the accident date. After 18 months, if the injured person is still unable to return to work and has a severe injury, loss of earning capacity (LOEC) benefits are payable up to 3 years following the transport accident [7].
In Victoria, WSV provides compensation insurance for the majority of employers (representing approximately 85 % of the Victorian working population). WSV does not provide insurance for the proportion of the Victorian working population who are sole traders, employed at self-insuring agencies or federal government employees (approximately 15 % of the working population). All claims that exceed the financial threshold for health care expenses ($660 from 1st July 2014 and indexed annually) or requiring more than 10 days off work are required under state law to register with WSV (into effect in 1997). These cases are then managed by case managers at one of five private sector insurers contracted to WSV. To receive workers compensation benefits, workers must have their illness or injury certified by an approved medical practitioner. Workers are also eligible to lodge a claim for a mental health condition if there is a demonstrable link between work and the mental health condition. WSV provides the following compensation and services to injured workers: income replacement, medical and allied health treatments, ambulance transport, hospital treatment, personal and household help, impairment lump sums, and common law damages (where certain criteria are met). Income replacement are paid for up to 130 weeks (95 % of the pre-injury average weekly earnings (PIAWE) for the first 13 weeks and 80 % PIAWE from 14 to 130 weeks) after which point benefits cease unless the worker is considered to have a very severe and ongoing injury, usually determined via a medical assessment process [8].

The Compensation Research Database
The Compensation Research Database (CRD) is a unique administrative database established by the third author as a platform to support research into compensation health. The CRD is held by the Institute for Safety Compensation and Recovery Research (ISCRR), an institute jointly established by the TAC, WSV and Monash University. The CRD includes data for transport and work-related claims covered by the TAC and WSV, respectively, since the mid 1980s. The data included on the two databases of interest hold over 20 years of accepted and denied population-based data of every compensable transport and workplace injury claims in the state of Victoria. Access to the CRD is made publicly available for other researchers to use, under strict guidelines approved by the compensation authorities and the Monash University Human Research Ethics Committee. Enabling data sharing of the CRD to the scientific community promotes scientific integrity, increases transparency, accelerates the impact of research by facilitating application of reusable data to new study questions, encourages and strengthens collaboration among researchers to share resources and produces new findings [25,26].

Data collection
The TAC and WSV have maintained administrative databases since their establishment in 1987 and 1986, respectively. These large datasets are primarily used to manage the compensation system and monitor system performance. The data is collected in accordance with the Privacy Policies of the TAC and WSV in order for the authorities to perform their statutory functions under the Accident Compensation Act 1985 [27] and Transport Accident Act 1986 [28].
Consent from TAC clients to use data for research purposes is obtained from the injured person over the telephone when a claim is lodged with the TAC (within 12 months following the date of the accident). The claim form is completed over the telephone and information regarding the transport accident are keyed directly into the TAC system. The injured person is then asked to complete a 'General Authority to Release Information' form which explicitly references the TAC privacy policy [29]. The privacy policy specifies that personal and health information is collected primarily for claims management and also for related purposes including accident research. The TAC also gathers information relevant to the claim from health service providers (e.g. certificate of capacity in order to access the clients' condition and their capacity for work as a result of the injury), insurers, government agencies and employer. Following the telephone call, the TAC will make a decision to accept or deny the claim (usually within 21 days).
Similarly, consent from injured workers to use data for research purposes is obtained when a claim is lodged with the insurer who manages claims on behalf of WSV (within 30 days following the date of the accident). The injured worker completes the 'worker's injury claim' form and includes a medical certificate of capacity (if claiming for weekly income payments). The worker's injury claim form explicitly references the WSV privacy policy which provides for use of information gathered to perform the WSV functions [30]. The employer completes the 'employer injury claim' form. Both the worker and employer's injury claim forms are then submitted to a WSV's insurer by the employer. The WSV's insurer makes preliminary assessment of the claim and enters records details on the WSV system directly. WSV also gathers information relevant to the claim from health service providers (e.g. certificate of capacity in order to access the clients' condition and their capacity for work as a result of the injury or disease), insurers, government agencies and employer. A decision to accept or deny the claim is usually made within 28 days of claim lodgement.
The contract establishing ISCRR as an independent research institute at Monash University funded by TAC and WSV included an agreement requiring that the TAC and WSV make de-identified administrative data available to ISCRR to support the research activities of the institute. An overview of the datasets as of December 2014 provided by the TAC and WSV are shown in Fig. 1. TAC's claims and payments datasets are linked by a unique claim identifier. Similarly, WSV's claims, payments, services, medical certificates and hospital admissions datasets are linked by a unique claim or payment identifier. All information received by ISCRR is potentially re-identifiable. Names and all contact details (including addresses, telephone numbers and in the case of minors, details of legal guardians) of injured individuals are removed from the data before ISCRR receives it. The data also contains details of healthcare services provided to injured individuals and the names and contact details of the service providers are also removed from the dataset prior to ISCRR receiving the data. TAC and WSV claim numbers are replaced by a 'dummy' identifier for the injured person. The dummy identifier is created by the organisations providing the data to ISCRR and matches each individual in the de-identified dataset to their TAC and WSV claim number. TAC and WSV maintain the key that links the dummy identifier to the claim number, and this linkage key is not available to ISCRR.

Variables available
The information collected by the TAC, WSV or their authorised insurers includes demographics, injury, payments and treatments. Income payments are automatically recorded in the TAC and WSV systems. Information pertaining to denied claims is retained in the database. Information necessary for claims handling, from organisations such as VicRoads [31] and healthcare providers (e.g. treatment invoices), are also collected. This is all collated centrally in administrative datasets of the organisations. Tables 1 and 2 show examples of variables collected in each of the datasets. Comprehensive TAC and WSV's data dictionaries are also available.

Data quality
Data is transferred to ISCRR annually via a Secure File Transfer Protocol (SFTP). Data quality assurances are routinely conducted in-house by ISCRR staff. Rigorous examination for data completeness and accuracy using data profiling is performed on the CRD. Additional data qualities checks applied on the CRD include validation rules (e.g. service date cannot be prior to the injury date), and investigating out-of-range values (e.g. age cannot be lower than 0). Any anomalies detected in the CRD are reported back to the TAC and WSV for review.

Data strengths and limitations
There are key advantages associated with using the CRD in research. The CRD has population coverage of transport and workplace injury in Victoria. It is readily available and therefore cost-effective to use as no data collection is required. Given the longitudinal nature of the data, the CRD also provides the opportunity to track individual over time and assess detailed service level information on a daily basis. The CRD, notably the WSV datasets use standard coding systems which are consistent with other jurisdictions in Australia. The standard coding system can also be mapped to international classification systems. More recently, the TAC has implemented the use of an international classification system such as the International Classification of Diseases, tenth edition (ICD-10). However, this is not routinely available across all claims. Finally, the CRD has the ability to be link to population surveys and population-based registries which can greatly enhance the available information about clinical or social characteristics of the population.
Despite the many advantages of using the CRD, a number of limitations must be noted. The information collected by the TAC and WSV is restricted to data required for administrative purposes. Therefore, the variables a researcher may be interested in may or may not be central to the primary record keeping. Changes to administrative procedures could change definitions and make comparison over time problematic. However, source code mapping is applied to ensure consistent coding over time. Cases of transport and workplace injury can be missed if the injured persons did not file a claim or the claim was rejected by the TAC or WSV, therefore underestimating the true prevalence. Services can be missed if the healthcare providers did not report it for billing purposes. Services not covered by the TAC or WSV and services accessed outside the compensation system are not included in the CRD. However, linking the CRD with healthcare datasets has allowed examination of services accessed outside of the compensation system. Finally, a number of claims may have missing items. Codes may also be applied incorrectly by those recording the data.

Return to work
One area of research where the CRD has been applied is return to work. Return to work is an important step in recovering from an injury, returning to a normal life and reducing the financial and emotional burden on the individual and their family. The CRD has been used to identify predictors of sustained return to work following a work-related injury or disease [36] and to assess the impact of an aging workforce on return to work [37]. In addition, the CRD has been used to explore the role of the general practitioner (GP) in facilitating return to work [38,39]. The CRD has enable identification of patterns and trends of GP certification behaviour by exploring the types and duration of medical certificates issued by a GP on various injury types. Identifying various individual, organisational and healthcare factors that influence return to work has allowed identification of 'high-risk' groups for return to work interventions, redesign of capacity medical certificates by WSV, and

Healthcare service use
The CRD can identify all individuals seen by a particular healthcare service provider due to the information provided as part of their treatment and billing practices. This enables the CRD to produce a profile of all services received by each individual and their associated costs over any defined time period. The CRD has been previously used to explore healthcare service utilisation following work-related musculoskeletal disorders [45], transport-related injuries [46,47] including whiplash injury [48] and traumatic brain injury [49,50]. The CRD allows detailed examination of the use and costs of healthcare services which are important for the planning of resources and developing public policies.

Data linkages
Aside from investigation of issues unique to each scheme, the CRD provides a platform for linking data to other datasets that enable large-scale epidemiological studies. The CRD is a potentially re-identifiable dataset. It cannot be directly linked to external datasets using common identifiers such as name and date of birth as these are de-identified. However the TAC and WSV maintain a linkage key that enables data linkage with their active involvement and approval [51]. In this way, it is possible to link external datasets to the CRD on a case-by-case basis, whilst maintaining the de-identified nature of the database for routine analysis and surveillance activity. The CRD has been linked to injury registries already managed by Monash University including the Victorian Orthopaedic Trauma Outcomes Registry (VOTOR). VOTOR monitors and evaluates the care of orthopaedic injured patients in Victoria. Linking the CRD with VOTOR has helped to improve knowledge of the drivers  of claim costs, claim durations and the relationship with patient outcomes [52]. The CRD has been linked with Medicare, the Australian universal healthcare system. Medicare provides all Australians with free or low-cost access to medical and hospital care. It is financed by the Australian government, with funding from income taxes and the Medicare levy. The Medicare levy is a surcharge for individuals with income above a certain level who do not have private health insurance [54]. Linking the CRD with Medicare has enabled detailed examination of preexisting health conditions, medications and health service use and how these impact transport injury rates and transport injury outcomes [51,53]. The CRD has recently been linked to hospital datasets including the Victorian Admitted Episodes Dataset (VAED) and the Victorian Emergency Minimum Dataset (VEMD) via the Victorian Department of Human Services and Health. Linking the CRD with the VAED and VEMD will allow thorough examination of prior health service utilisation on recovery from work-related injuries. The CRD provides a unique platform that leverages existing data and linkages in order to evaluate the impacts of transport and workers' compensation schemes on outcomes for injured Victorians.

Future activities
More recently ISCRR has developed an enhanced program of data activities that expands on the existing work to date to deliver new insights and innovations to TAC and WSV. The program will contain three key streams of work. The first stream will be the expansion of the existing CRD activities including data management, the second stream will involve linking and comparing compensation data to external sources of information of relevance to the compensation system (data linkage as described above), and the third stream is designed to use innovative methods for predictive mining and analysing data to deliver new insights. These future planned CRD activities, with the exclusion of data linkages are described below. An example of how the CRD can be applied to a 'natural experiment' is also provided below.

Data management
Additional available data sources from the TAC and WSV will be incorporated into the CRD for research use. This includes medical certificates issued for TAC clients, and workplace safety inspections reports undertaken by WSV. Furthermore, existing dataset such as the WorkHealth program conducted by WSV [55] and newly developed datasets such as the Victorian Working Population Survey will be added to ISCRR's data holdings. These datasets will be housed and managed by ISCRR and made available for research use using the protocols established for the CRD. WorkHealth is a program which aims to promote the benefits of a healthier workforce, to reduce Victorian workers' risk of chronic preventable diseases such as type 2 diabetes and cardiovascular diseases and to explore the links between chronic disease and workplace injury. One part of the program involved Victorian workers completing a 15 min voluntary health risk assessment. To date, 800,000 health checks records of Victorian workers are available in the WorkHealth dataset. The Victorian Working Population Survey is currently under development. The project will annually survey eligible Victorian workforce (employed and unemployed persons) to provide a monitor of perceptions relating to work-related physical and mental health risks and hazards. In addition to hazard perception, the survey will monitor attitudes to health and safety and the recollection and perception of incidents among individuals as well as perceptions of 'reasonable' risk, mental and physical health status, job satisfaction, work hours, industry, appointment description, intention to leave, duration of employment, education, and other related labour-related and demographic details.

Data analytics
Data analytics is designed to use innovative methods for predictive mining [56] and analysing data to deliver new insights, with a strong focus on risk prediction, and emerging economic, social and demographic issues of concern. Advanced predictive mining and forecasting techniques will be applied to the CRD in order to answer questions that cannot be addressed using conventional statistical techniques. For example, how the combination of order, frequency, the gap between, and the delay in providing medical and paramedical services may lead to different mental health recovery outcomes. In addition, the large number of claims and the availability of service data at the daily level enable the researchers to study claimant profiles and recovery trend clusters. The CRD is also expected to be embedded in a dynamic complex model such as agent-based computational models of claims management, and be used for high resolution visualization models [57,58]. The high resolution visualization models will allow effective communication of findings and key messages to the compensation health community, decision-makers and the public. Furthermore, the availability of unstructured data such as medical reports held by the compensation authorities will allow application of text analytics to areas not yet examined. Emerging economic, social and demographic issues of concern in Victoria can also be investigated using the CRD. For example, analyses can be conducted to understand the dynamics of recovery and compensation under macro-level changes such as an aging workforce, aging drivers' population, economic cycles, decreasing share of employment in manufacturing, and changing gender composition in the workplace.

Example of CRD application as a natural experiment
One of the key future applications of the CRD will be in designing and studying 'natural experiments' . A natural experiment usually takes the form of an observational study in which the researcher cannot control or withhold the allocation of an intervention to particular areas or communities, but where natural or pre-determined variation in allocation occurs [59]. An example of a natural experiment can be examining the social and economic responses to a financial crisis [60,61]. To conclude this section, we briefly present some evidence on how the CRD can be applied to study the impact of the recent global financial crisis (GFC) on work-related injury claims in Victoria, using trend analyses and forecasting. According to the literature [62,63], the impact of the GFC on the number and duration of work-related injury compensation claims has been mixed. In contrast, the available evidence in Australia suggested that fewer claims were lodged during the time of crisis [62]. Figure 2 presents two trends. The solid line shows the total number of workers employed in Victoria between January 2004 and December 2011 [64] and the dotted line shows the trend 1 of standard 2 claims for injured workers aged 15-65 years in Victoria over the same time period. The number of claims has significantly decreased from October 2008 (the start of the GFC in Australia), reaching its lowest level by April 2009, and then recovering over the next 12 months. A potential reason behind this pattern is the high level of unemployment during the GFC, which led to a decline of workers in the workforce and in turn resulted in fewer injuries. However, the downward trend observed in the Victorian labour force (solid line) does not appear to be as substantial as the downward trend observed in claims lodgement (dotted line). Figure 3 presents the probability of a work-related injury claim per worker in Victoria by month of injury 3 (solid line). To highlight the impact of the GFC, the data between January 2004 and December 2007 is used to forecast the trend for the period of January 2008 and December 2010 (dotted line in Fig. 3). As the figure shows, after controlling for the size of labour force in Victoria, it appears that the GFC was associated with fewer work-related injury claims lodged for compensation. Guthrie et al. [62] suggested that a decrease in claims lodgement may be due to the slower pace of work and workers' fear of job loss. Using the available data in the CRD, this analysis can be broadened to study the impact of the GFC on the dynamics of work-related injury claims across age groups, gender, industries, injury types, income levels and workplace sizes.

Availability and requirements
Researchers including those based outside of Australia who are interested in accessing the CRD can access data on request. Researchers must be able to demonstrate that the data will be securely stored and protected. Information related to the process for requesting data is contained on ISCRR's website at: http://www.iscrr. com.au. Prior to submitting a data request, a meeting with an ISCRR representative from the CRD data team is necessary to understand the researcher's data needs, provide an overview of the data within the CRD and discuss any data-related issues pertaining to the data request. The provision of the data extract may be subject to a fee-for-service, and will be discussed with the researcher in advance of submitting their data request. The CRD is governed by a steering committee including representative of ISCRR, Monash University, the TAC and WSV. Steering committee meetings are held quarterly or more often if required. Access to the data is subject to the conditions of the CRD data access policy and approval of the steering committee. The following criteria (but not limited to) are taken into consideration when reviewing the data request; (a) demonstrated reasonable need for the requested data to answer the specified research question(s), (b) the use of data is not harmful to the individuals that the information is about, (c) the proposed research is clearly in the public interest, (d) the proposed research is consistent with ISCRR's research strategy, (e) scientific merit (peer-or merit-reviewed), (f ) ethical considerations. Once the data request is approved by the CRD steering committee, a de-identified password protected dataset (i.e. CVC, SAS, SPSS data file) will be made available to the researcher via SFTP. Researchers are required to submit all research outputs for review to the CRD steering committee prior to public disclosure.
In Australian compensation health research, the process and governance approach to the CRD is unique. Presently, in many jurisdictions in Australia, researchers are unable to access compensation data freely. Access is provided on a specific request basis to a compensation authority in some other jurisdictions internationally. In contrast, the CRD is held independently and at arm's length from the compensation authority via a third party entity. Given the growing field of research on open data [25,26], opening access and sharing of the CRD worldwide to researchers enables scientific integrity, increases transparency, participation and collaboration.

Conclusions
The CRD was designed to investigate the role of compensation processes in injury recovery and to identify emerging areas in compensation health. Given the limited availability of data available in this growing research field, researchers who are interested in compensation health would find the CRD rich in information and invaluable in their future studies. Authors' contributions KP drafted the manuscript. BM helped draft the manuscript. AC conceived the study and participated in drafting the manuscript. All authors read and approved the final manuscript.