Data source and variables
We used data from the Thai Health and Welfare Survey of 2003 conducted by the National Statistical Office. In this survey every available member of a sampled household aged 15 years or older was interviewed, a total of 37,202 individuals from 19,952 households.
The health outcome studied was recent morbidity, a binary variable. The English translation of the relevant survey question was: "Have you been ill or not feeling well during the past one month?"
Monthly adult-equivalent household income was used as the measure of socioeconomic status. For Thailand, empirical studies suggest weighting each child aged under 15 as 0.5 of an adult and allowing for economies of scale applying to any household with more than one member by raising adult-equivalent household size to the power of 0.75 .
Three categorical health determinants were examined: eight age-sex groups (males aged 15-29 years, males aged 30-44 years, males aged 45-59 years, males aged 60 years or older, females aged 15-29 years, females aged 30-44 years, females aged 45-59 years, females aged 60 years or older); four levels of education (no education, primary, high school, higher education); and five areas of residence (Bangkok, Central, North, Northeast, South).
Measurement of inequalities in health as a concentration index (C) has primarily drawn on the literature on income inequality measures [3, 18, 19]. The concentration index can be written in various ways, but one of the most cited is that proposed by Kakwani et al. :
is the variable of interest for the ithperson; μ is the mean or proportion of h;n is the number of persons; and if the n individuals are ranked according to their socioeconomic status, beginning with the most disadvantaged, then R
is their relative rank, i - 0.5/n. When there is no inequality (or when inequality is balanced and opposite for equal fractions of the income-ranked population), the concentration index equals 0. If the variable of interest is concentrated at a lower (or higher) socioeconomic level, the concentration index becomes negative (or positive).
Three approaches to the decomposition of a binary health outcome are compared: Ordinary Least Squares (OLS), marginal effects from probit analysis, and Generalized Linear Model (GLM) specifying binomial distribution and identity link .
Ordinary Least Squares (OLS)
Wagstaff et al  demonstrate that the concentration index of a continuous health outcome can be decomposed into the contributions of individual determinants. In this case, a linear additive relationship between outcome variables h
and the contributions of k determinants is appropriate:
and OLS regression is applied to estimate the β
's. By substituting from Equation 2 into Equation 1, the overall concentration index (C) can be rewritten as a linear combination of the concentration indices of the determinants, plus an error term (Equation 3):
are the coefficients from regressions of the health outcome on each k determinant, is the mean or proportion of each k determinant, μ is the mean or proportion of the health outcome, and C
is the concentration index for the kth determinant calculated using Equation 1, replacing the health outcome (h
) with the determinant (x
is the generalized concentration index for the error term.
Health sector variables are seldom continuous and are often binary (e.g., ill, not ill). Van Doorslaer  modified Wagstaff's method for use in such non-linear settings. The essential modification was to estimate the β
's that go into Equation 3 from a probit regression instead of OLS regression. More specifically, van Doorslaer recommends the use of marginal effects of the β
's. The World Bank technical notes on non-linear estimation suggest generating marginal effects using the Stata command: dprobit y x. Marginal effects can also be calculated using the mfx command after running the non-linear model. By default, the marginal effects of each explanatory variable are evaluated at sample means, and in large samples the sample mean approximates the overall mean of the marginal effects .
Generalized Linear Models (GLM)
The GLM is an extension of the linear modelling process that allows models to be fitted to data that follow probability distributions other than the normal distribution, such as the binomial distribution . The GLM relaxes the assumption of homogeneity of variances that is usual in linear models and enlarges the class of linear OLS models in two ways:
the distribution of Y for fixed x is assumed to be from an exponential family of distributions , which includes important families such as the normal and binomial distributions;
the relationship between the mean of Y and a linear combination of x's is specified by a link function.
The link function connects the probability distribution of the outcome variable (the random part of the model) to the systematic (explanatory) part of the model. For traditional linear models in which the outcome variable follows the normal distribution, the link function used is the identity link; it specifies that the expected value of the outcome variable is a linear combination of the x's. When the outcome variable follows a binomial distribution, link functions commonly used are the logit and probit, giving rise to logistic and probit regressions respectively.
Binomial distribution with identity link
The use of GLM with a binomially distributed dependent variable and specifying an identity link function in this non-linear context is a suitable choice in the decomposition analysis of a binary outcome because it considers the structure of the distribution while preserving the link between the independent and dependent variables. The decomposition requires an identity link for the mathematics in Equation 3 to hold. This can be calculated using the Stata command:glm y x, family(binomial) link(identity).