Estimating the wave 1 and wave 2 infection fatality rates from SARS-CoV-2 in India

There has been much discussion and debate around the underreporting of COVID-19 infections and deaths in India. In this short report we first estimate the underreporting factor for infections from publicly available data released by the Indian Council of Medical Research on reported number of cases and national seroprevalence surveys. We then use a compartmental epidemiologic model to estimate the undetected number of infections and deaths, yielding estimates of the corresponding underreporting factors. We compare the serosurvey based ad hoc estimate of the infection fatality rate (IFR) with the model-based estimate. Since the first and second waves in India are intrinsically different in nature, we carry out this exercise in two periods: the first wave (April 1, 2020–January 31, 2021) and part of the second wave (February 1, 2021–May 15, 2021). The latest national seroprevalence estimate is from January 2021, and thus only relevant to our wave 1 calculations. Both wave 1 and wave 2 estimates qualitatively show that there is a large degree of “covert infections” in India, with model-based estimated underreporting factor for infections as 11.11 (95% credible interval (CrI) 10.71–11.47) and for deaths as 3.56 (95% CrI 3.48–3.64) for wave 1. For wave 2, underreporting factor for infections escalate to 26.77 (95% CrI 24.26–28.81) and to 5.77 (95% CrI 5.34–6.15) for deaths. If we rely on only reported deaths, the IFR estimate is 0.13% for wave 1 and 0.03% for part of wave 2. Taking underreporting of deaths into account, the IFR estimate is 0.46% for wave 1 and 0.18% for wave 2 (till May 15). Combining waves 1 and 2, as of May 15, while India reported a total of nearly 25 million cases and 270 thousand deaths, the estimated number of infections and deaths stand at 491 million (36% of the population) and 1.21 million respectively, yielding an estimated (combined) infection fatality rate of 0.25%. There is considerable variation in these estimates across Indian states. Up to date seroprevalence studies and mortality data are needed to validate these model-based estimates.


Main text
In late August 2020, India was predicted to surpass the United States in terms of reported case counts from SARS-CoV-2 infections. To the surprise of many modelers the curve turned corner in late September with the highest number (97,894) of daily new cases reported on 16 September 2020 [1]. After a steady decline for nearly five months, the curve started rising again, growing into an astronomic second wave. The highest number (414,280) of daily new cases in wave 2 was reported on May 6, 2021. As of May 15, 2021, India has reported 24.7 million cases, the second highest in the world, and nearly 270 thousand deaths, the third highest in the world. In this brief report, we reconcile estimates of the infection fatality rate (IFR) inferred from seroprevalence studies with epidemiologic model-based estimates that account for underreporting of infections and deaths in India for wave 1. We then proceed to compute, compare and combine wave 1 with wave 2 IFR estimates.

Synthesizing evidence from seroprevalence studies
We review available seroprevalence results that vary across states and specifically across rural versus urban areas. Whereas in many major metros and slum areas the seroprevalences were reported to be more than 50%, in rural areas there is a wide variation ( Table 1). The latest national serosurvey (from 17 December 2020 to 8 January 2021) reports 21.4% of all Indians above age 18 have antibodies present that indicate past SARS-CoV-2 infection [2]. Since approximately 59% [3] of India's 1.36 billion citizens are above age 18 and 10.45 million infections were reported as of 8 January 2021, this points to approximately 172.47 million infections, with an implied underreporting factor of 16.5 (172.47/10.45). In other words, only 6% of India's COVID-19 infections are reported, while 94% remained undetected or unreported. We use this estimated number of infections to calculate the IFR. Regional studies based on crematorium data and counting obituaries in India have suggested an underreporting factor in the range of 2 to 5 for COVIDdeaths; this is at best ad hoc and anecdotal in nature and no rigorous quantification of missing death numbers is currently available [4].

Model-based estimates
Using a compartmental epidemiologic model (as explained in the Supplementary Methods) with a compartment for unascertained cases and deaths after accounting for the false negative rates of RT-PCR and rapid antigen tests used in India [5] we estimate the national and state-level IFR in India by inferring underreporting factors for cases and deaths. We assume that the estimated total infections (deaths) are comprised of reported and unreported infections (deaths). The model divides the population into ten disjoint compartments: S (Susceptible), E (Exposed), T (Tested), U (Untested), P (Tested positive), F (Tested False Negative), RR (Reported Recovered), RU (Unreported Recovered), DR (Reported Deaths) and DU (Unreported Deaths), as described in Additional file 1: Figure S1. A set of nine differential equations govern the transmission dynamics, which are approximated by means of discrete recurrence relations. For any compartment X , the instantaneous rate of change at time t (given by dX dt ) is approximated by the difference of counts in that specific compartment on the (t + 1) th day and the (t) th day, i.e., say X(t + 1) − X(t) . Parameters are estimated using Bayesian techniques by generating samples from the posterior distribution using a Metropolis-Hastings algorithm with Gaussian proposal density, with 95% credible intervals (CrI) to quantify uncertainty of the estimates. Additional file 1: Table T4 presents an overview of the parameter descriptions and settings for this model.

Comparing and combining waves 1 and 2
Due to the stress on the healthcare and reporting infrastructure, the fatality and underreporting processes were very different across the two waves. Thus, we consider two separate phases of the pandemic, with wave 1 from April 1, 2020-January 31, 2021 and wave 2 starting on February 1, 2021. This definition is artificial and is guided by the fact that the national effective reproduction number (R eff ) crossed unity for the first time in 2021 on February 14 and we allow a two-week incubation period before that date. Using daily time series of case, death and recovery counts we compare fatality rates and underreporting factors associated with the two time periods using the compartmental models. Further, using observed data from the two waves and the model-based underreporting factor estimates, we compute cumulative case and death counts for the total duration of waves 1 and 2. We multiply the wave-specific cumulative counts with relevant underreporting factors and sum over both waves to get combined counts of cases and deaths. The estimated numbers of cumulative deaths and infections provide us with a combined IFR estimate for India as of May 15.

IFR estimates for wave 1 using seroprevalence surveys
The observed case fatality rate (CFR) in India is low. With 154,428 deaths and 10.76 million cases reported as of January 31, 2021 the estimated CFR for wave 1 is 1.435% (95% confidence interval 1.428-1.442%) [1]. The estimated number of infections from the January seroprevalence survey imply an approximate infection fatality rate of 0.09% (i.e. 154,428/172.47 M). The anecdotal underreporting factor for deaths (in the range of 2-5) implies an ad hoc estimate of IFR in the range of 0.19-0.45%.

Estimates from epidemiological models
For wave 1 our estimate for the national IFR 1 (observed cumulative deaths/estimated cumulative total infections) is 0.129% (95% CrI 0.125-0.134%) and IFR 2 (estimated total cumulative deaths/estimated total cumulative infections) is 0.461% (95% Table 1 Summary of results from various serological surveys conducted in India during 2020-21 a The first ten states with maximum cumulative COVID-19 cases (as of 31 January 2021) are included in this table b Information sourced from Wikipedia. (https:// en. wikip edia. org/ wiki/ List_ of_ states_ and_ union_ terri tories_ of_ India_ by_ popul ation) c The first national serosurvey conducted by the Indian Council of Medical Research (ICMR) began on May 11 and ended on June 4, 2020. A randomly sampled, community-based survey was conducted in 700 villages/wards, selected from the 70 districts of 21 chosen states of India, categorized into four strata based on the incidence of reported COVID-19 cases. Four hundred adults per district were enrolled from 10 clusters with one adult per household. A total of 28,000 adults were  CrI 0.455-0.468%) with an underreporting factor for cases estimated at 11.11 (95% CrI 10.71-11.47) and for deaths at 3.56 (95% CrI 3.48-3.64). These modelbased estimates in wave 1 are largely consistent with the estimates from the latest and third nationwide seroprevalence study. In wave 2, using the same model we see a stark contrast with wave 1, with case and death underreporting factor estimates escalate to 26.73 (95% CrI 24.26-28.81) and 5.77 (95% CrI 5.34-6.15) respectively, leading to IFR 1 estimate of 0.032% (95% CrI 0.029-0.035%) and IFR 2 estimate of 0.183% (95% CrI 0.18-0.186%). This pattern is consistent with wave 2 CFR being estimated at 0.845% (95% CrI 0.840-0.849%), 59% of wave 1 estimate. Figure 1 shows underreporting factors and estimated infections and deaths in waves 1 and 2 for India while Fig. 2 highlights state-level variations in IFR 1 , IFR 2 , CFR for waves 1 and 2 for 20 states in India with large case/death counts.

Combining waves 1 and 2
The composite CFR as of May 15 stands at 1.1%. The estimate for total (reported + unreported) cumulative case count for waves 1 and 2 combined is 491.73 (95% CrI 453.03-524.56) million, while the estimated number of total (reported + unreported) deaths is 1216.35 (95% CrI 1154.21-1272.70) thousand. This leads to a combined IFR 1 estimate of 0.06% and IFR 2 estimate of 0.24%. Detailed numerical estimates of underreporting factors across states for waves 1 and 2 are presented in Additional file 1: Tables T1, T2 and T3 and Additional file 1: Figures S2 and S3.

Discussion
Despite accounting for underreported deaths, the large number of asymptomatic/undetected infections (more than 90% by any calculation) indicate a lower IFR in India in comparison with other Western countries. A meta-analysis across the world places the pooled mean of IFRs at 0.68% (95% CI: 0.53-0.82%) [6], while another meta-analysis places the median at 0.27% [7] (with a range of 0-1.63%). Seroprevalence surveys and epidemiologic models qualitatively agree on the estimated IFR for India for wave 1. Up to date serosurvey and excess death/mortality data are needed to validate wave 2 and combined estimates. The estimated number of total infections as of May 15 suggests roughly 36% of Indians have an active or past infection, a number that will need to be verified with synchronous sero-surveys.
The current reduction in fatality rates in wave 2 that we notice could be primarily due to two reasons, one is that we do not have the same length of follow-up period and complete data on the decay phase of wave 2 curve. The second could be the different age composition of the infected populations in the two waves; it has been reported that the younger population got infected in larger numbers in wave 2 and they have lower risk of COVID-19 mortality. A fraction of the older population (aged 65 + years) also got vaccinated during wave 2. However, this hypothesis about reduced fatality rates in wave 2 cannot be verified without more granular, age-sex stratified nationwide time-series data on case and death counts, which is currently unavailable.

Limitations
We do not have a rigorous way to validate the extent of underreporting of deaths. An excess death calculation based on historical mortality data is infeasible at this point due to absence of all-cause-mortality data in the last three years from India. India has a very young population with only 6.4% in age group 65 + (compared to the US where this proportion is 16.5%) so a comparison of overall IFR between India and say the US is not fair, and only age-specific IFRs should be calculated and compared when more data become available. We do recognize that wave 2 information is appreciably incomplete, and the estimates will change as we have more complete information on deaths. For example, while our wave 2 analysis period ended on May 15, the highest daily number of deaths (4529 daily new deaths) were reported shortly after on May 18. Thus, our analysis presents an updated but incomplete picture of wave 2.

Fig. 1
Comparison of observed and estimated case and death counts and associated underreporting factors from waves 1, 2 and both waves combined