RT used data collected from the Fatality Analysis Reporting System over the years 1975–2006. The primary analysis computed a relative risk by taking the number of persons in fatal crashes during election hours on presidential election days from 1976 through 2004, and comparing this to the average of the number of individuals in fatal crashes during the same hours one week before and one week after. We henceforth refer to RT’s test statistic—the risk relative to the same time period in the previous and following week—as the RR (relative risk).Footnote 1 Throughout, in the interest of replication, we will use RT’s test statistic as our basis for inference.
Our replication extended RT’s methodology with an updated dataset from the Fatality Analysis Reporting System; we examine alternative time periods and use an alternative method to characterize uncertainty. Following a finding reported in a later article by Redelmeier and Tibshirani [2], we estimate RRs during non-election hours as well as during the full 24 h on presidential election days.
To assess statistical significance, we estimated RRs for the 100 Tuesdays before and after presidential election days. (We restrict ourselves to Tuesdays as all United States presidential elections are held on Tuesdays, thus conditioning on any day-of-week effects.) Using these 200 RRs, we constructed an empirical null distribution, with non-parametric two-tailed P values computed as the proportion of RRs more extreme (i.e., RR greater or 1/RR smaller) than that of presidential election days. This procedure tests against the null hypothesis that election days have an RR consistent with being drawn randomly from the distribution of Tuesdays [3]. We further calculated 95 % Wald-type confidence intervals (CIs) under a normal approximation using the standard deviation of the empirical null distribution as an estimate of the standard error. Throughout, we consider only presidential elections from 1980 through 2008, as data were not available before 1975 or after 2012, precluding estimation for all Tuesdays surrounding the 1976 or 2012 elections.
Our method for characterizing uncertainty differs from RT’s method, which computes P values using a binomial test. The binomial test assumes that, under the null hypothesis, driving fatalities occur with equal probability on election days as on the Tuesdays in the week before and the week after. Since the probability of driving fatalities may differ from week to week for reasons that are unrelated to election days, a binomial test may overstate certainty about the risk posed by election days. To evaluate the properties of RT’s binomial test, we calculated the rate at which it rejected the null hypothesis of no effect across the null distribution of non-election day Tuesdays.