We compared utility estimates from two well-accepted methods assessing health state utilities to inform economic analysis of medical interventions. Both the vignette-based and the instrument-based methods are theoretically correct, but the evidence is limited as to their convergence. An earlier review of health utility measures and elicitation noted that study population characteristics such as socioeconomic demographics and disease or health states should be taken into consideration when deriving health utilities from vignettes [15]. Another study comparing community quality of well-being (QWB) and patient-based preferences (TTO) for health outcomes from randomized controlled trial data found that QWB preference scores were significantly lower than corresponding TTO scores, which may have important implications for estimating QALYs gained [16].
Utility score estimates derived from members of the general population may help approximate a societal viewpoint, but the utility assessment involving a hypothetical health state is limited by the accuracy and level of detail in the health state descriptions. Additionally, including a label of the health state being described and the setting in which the vignettes are presented (online or in-person) may affect the respondents’ valuations of the state [17, 18]. Direct utility elicitation using the TTO (i.e., based on vignettes) is considered appropriate/acceptable by some HTA bodies when EQ-5D-3L is not available, and TTO with a 10-year time horizon is the most frequently used approach among the direct techniques, because of greater comparability with the method used to develop the EQ-5D-3L scoring algorithm [19, 20].
Switching to disutility has a potentially important effect on both the estimated impact of clinical events and on cost-effectiveness. The calculation of disutility emphasizes the impact of clinical events and the importance of the baseline utility to which they are applied. In comparing the two studies, the large difference between the basic health state utility without the event was enough to reverse the magnitude of the differences between the studies. Utilities look higher when using the EQ-5D-3L instrument, but that disutility of events was also higher due to the higher no-event utility.
We found a weak positive correlation between both the chronic and acute utility regressions (Fig. 1c). This means there is substantial unexplained variability demonstrated between the studies that are measuring the same health states, and that mapping between the two can only be weakly predicted. This is concerning as such large utility differences may impact QALY estimates when calculating cost effectiveness of drugs and other interventions. Thus, differences in utility methodologies may lead to variabilities in ICER calculations when using EQ-5D-3L and vignette methods.
As there is no gold standard for utility measurement, our goal was not to “validate” a method but rather to compare, and we found that the methods were only weakly correlated. Importantly, there were no consistent differences other than the utility being lower for chronic sub-states for vignette-based method, which may reflect patient adaptation [10].
From a practical standpoint, it is reasonable to use a method that would be accepted by a specific decision-maker. For instance, in a recent assessment of a novel therapy in end stage renal disease, the UK National Institute for Health & Care Excellence (NICE) had accepted the EQ-5D-3L based utilities in a cost-utility model [21]. If a clinical trial with a reasonable duration and health state measurement opportunity is being planned, then including an instrument in such a trial is a good strategy—longitudinal data allow application of regression models to estimate health state utilities. Where such trials are not available, or have already been completed without a utility instrument, a bespoke study to collect utilities seems the only option. A vignette-based method would allow a detailed description of the health states, but will invariably introduce a level of subjectivity. This same situation may arise where valuation of acute health states is wanted but may not be practically assessed in actual patients. To reduce the level of subjectivity, one could argue that a standard battery of vignettes ought to be catalogued so research is consistent across studies.
There are some nuanced differences that one could describe as “you get what you measure” and “you get what you describe” for each of the methods. In other words, if one is able to include multiple EQ-5D-3L measures in a trial, then there are opportunities to explore the data in a way that is fitting an economic model. Similarly, one can describe details in the vignettes to the level that helps discern details important for patients but not visible through a generic instrument, in time as well as in content. As a contrast, limited measurements of an instrument in a trial, like limited descriptions within vignettes, would hamper such flexibility on either side.
We recognize that the value set from Dolan et al. has been heavily criticized in the past and has several issues. One particular issue is the use of a linear regression model for utility values [22]. Authorities such as NICE should consider the criticism of existing value sets, creating more appropriate ones, and provide strong guidance on which value sets to use. Furthermore, the findings of this analysis could suggest that it might be important that all utility values should be quantified based on a consistent approach.
The comparison of the health states utilities in these two studies lies within a general realm of health states whose consequences would be fairly well understood and appreciated by the medical community and the general population when described as vignettes. In that sense, one would expect that the real experience of a health state measured by an instrument in patients would be comparable to a hypothetical imaginary experience by non-patients based on a description of this state, barring the adaptation aspect. A more nuanced health state that requires more details and that is not as readily known to a non-patient may call for a different approach, but that would challenge both vignette-based methods on the ground of complexity and added subjectivity, and the instrument-based approach on the grounds of sensitivity. These are important considerations for further research and methodological harmonization.