Methods
Following a systematic literature review [4], corresponding authors were contacted to share their data subject to ethical approvals and GDPR compliance. All data have been refactored according to guidelines we developed during a Social Contact Data Hackaton in 2017 as part of the TransMID project. Each survey is split into multiple files to capture participant, contact, survey day, household and time-use data. For each data type, there is one “common” file and one “extra” file in which more specific variables related to the survey are included. Each data set contains a dictionary to interpret the columns (see http://www.socialcontactdata.org for more information).
To extrapolate survey data to the country level and obtain social contact rates on a weekly basis, we incorporate participant weights accounting for age and the number of observations during week (5/7) and weekend (2/7) days. We use the United Nation’s World Population Prospects [12] as reference and constrain weights to a maximum of 3 to limit the influence of single participants. The social contact matrix \(m_{ij}\) can be estimated by:
$$\begin{aligned} m_{ij} = \frac{\sum _{t=1}^{T_i}w_{it}^d y_{ijt}}{\sum _{t=1}^{T_i}w_{it}^d}, \end{aligned}$$
(1)
where \(w_{it}^{d}\) denotes the weight for participant t of age i who was surveyed on day type \(d\in {\{\text{ weekday }, \text{ weekend }\}}\), \(y_{ijt}\) denotes the reported number of contacts made by participant t of age i with someone of age j and \(T_i\) denotes all participants of age i. By nature, contacts are reciprocal and thus \(m_{ij}N_i\) should be equal to \(m_{ji}N_j\). To resolve differences in reporting, reciprocity can be imposed by:
$$\begin{aligned} m_{ij}^\text {reciprocal} = \frac{m_{ij}N_i+m_{ji}N_j}{2N_i}, \end{aligned}$$
(2)
with \(N_i\) and \(N_j\) the population size in age class i and j, respectively [13]. This reciprocal behavior might not be valid for specific contact types, e.g. contacts at work for retail workers are most likely not contacts at work for their customers.
Transmission dynamics can be represented by the next generation matrix G with elements \(g_{ij}\) that indicate the average number of secondary infections in age class i through the introduction of a single infectious individual of age class j into a fully susceptible population [14]. The next generation matrix is defined by:
$$\begin{aligned} G = DMq, \end{aligned}$$
(3)
with D the mean duration of infectiousness, M the contact matrix and q a proportionality factor [9, 11]. The proportionality factor q combines several disease-specific characteristics that are related to susceptibility and infectiousness. Equation 3 can be reformulated as:
$$\begin{aligned} g_{ij} = D * m_{ij} * s_i *k_j * \hat{q}, \end{aligned}$$
(4)
where \(s_i\) denotes the susceptibility of age group i, \(k_j\) the infectiousness of age group j and \(\hat{q}\) other disease-specific factors. The leading right eigenvector of G is proportional to the expected incidence by age and R\(_{0}\) can be calculated as the dominant eigenvalue of G [5].
To evaluate intervention strategies, we focus on the relative impact of adjusted social contact patterns on R\(_{0}\) in line with the so-called social contact hypothesis [7] by cancelling disease specific features:
$$\begin{aligned} \frac{R_{0a}}{R_{0b}}&= \frac{\max (\text {eigen}(DM_aq)}{\max (\text {eigen} (DM_bq)} = \frac{\max (\text {eigen}(M_a*S*K))}{\max (\text {eigen}(M_b*S*K))}, \end{aligned}$$
(5)
where indices a and b refer to the different conditions, and S and K account for age-specific susceptibility and infectiousness, respectively [11]. Physical distancing can be evaluated by the elimination or reduction of location-specific subsets of the social contact data. Contacts reported at multiple locations are assigned to a single location in the following hierarchical order: home, work, school, transport, leisure and other locations. We simulate school closure by excluding all contacts reported at school. We evaluate physical distancing at work by applying a proportional reduction of the social contacts reported at work (\(p_{\text {workplace}}^{\text {distancing}}\)). To combine the effect of school closure and distancing at work, the social contact matrix M is calculated as:
$$\begin{aligned} M = M_\text {home} + (M_\text {work} * ( 1- p_\text {workplace}^\text {distancing})) + (M_\text {school} * 0) + M_\text {transport} + M_\text {leisure} + M_\text {other} \end{aligned}$$
(6)
We developed an interactive application to access and analyze social contact data based on R packages shiny [15] and socialmixr [16]. The user interface enables the selection of country-specific data, age categories, type of day, contact duration, intensity and gender. Using selection boxes, the user can opt to disable the assumption of reciprocity and participant weights. The user can also enable distancing strategies such as school closure or physical distancing at work, or include age-specific transmission parameters.
The user interface contains a plot of the social contact matrix and the principal results of the social contact analysis: M, relative incidences, the reference demography, participant statistics info on the data sets. Relative R\(_{0}\) and M ratios are printed if reactive strategies are selected.
As COVID-19 case study, we estimate the effect of school closure and physical distancing at work on disease transmission dynamics. In order to do this, we use 3 age classes: 0–18 years, 19–60 years and over 60 years of age. For each country, we calculate contact rates after excluding data from holiday periods. We capture transmission dynamics with 0%, 20%, 40% and 60% distancing at work, with and without school closure. As proof of concept, we include the scenario where children are less vulnerable compared to elderly [\(s_i = k_j = (0.5,1,1.5)\)], instead of uniform susceptibility and infectiousness.
Results
The http://www.socialcontactdata.org initiative, status 25th May 2020, includes data for Belgium, Finland, Germany, Italy, Luxembourg, Netherlands, Poland and the UK from POLYMOD [5], as well as data from other studies on social mixing in France [17], China [18], Hong Kong [19], Peru [20], UK [21], Russia [22], Zimbabwe [23], Vietnam [24], South Africa and Zambia [25]. All data are available on Zenodo [26,27,28,29,30,31,32,33,34,35] and can be retrieved within R using the socialmixr package.
The SOcial Contact RATES (Socrates) data tool [36, 37] enables quick and convenient generation of social contact matrices, relevant for the spread of infectious diseases. Figure 1 presents a screenshot of the user interface. The potential of using social contact patterns to simulate infectious disease transmission are endless, and we hope with this initiative to support data-driven modeling endeavors. The survey data from France and Zimbabwe contain multiple days per participant, hence we included only the first day for each participant to minimize the effect of reporting fatigue.
We demonstrate the effect of physical distancing at work and school closure on R\(_{0}\) in Fig. 2. If we assume uniform susceptibility and infectiousness, we predict for most countries a 10% decrease in R\(_{0}\) with workplace distancing of 60%. For Poland and Hong Kong, the reduction is slightly higher. The analysis for Peru shows little impact of workplace distancing since only few contacts were reported “at work”, whereas a substantial proportion of contacts was reported at the market or street. Cultural differences in how “at work” is understood should be considered when interpreting results. The data for Zimbabwe contains also relatively few reported contacts at work, which translates into a limited impact of workplace distancing in our analysis. The estimated R\(_{0}\) reduction due to school closure is more country-specific, e.g. 10% reduction for Belgium and Vietnam, but 20% for Italy, Luxembourg and France. If we assume that elderly are more vulnerable compared to children, as might be the case for COVID-19 [38], the impact of school closure decreases dramatically. The positive effect of physical distancing at work on R\(_{0}\) remains the same or increases.
The predicted relative incidences, as presented in Fig. 3, highlight the impact of school closure compared to an increase in physical distancing at work by age. The relative incidence in people 18–60 years of age decreases with an increasing proportion of workplace distancing, which is of interest if this age group is more vulnerable compared to children. The relative incidence in the age group above 60 years of age increases in all situations compared to no intervention. This does not imply that the absolute number of cases in this age group would rise.