Factors related to baseline CD4 cell counts in HIV/AIDS patients: comparison of poisson, generalized poisson and negative binomial regression models

CD4 Lymphocyte Count (CD4) is a major predictor of HIV progression to AIDS. Exploring the factors affecting CD4 levels may assist healthcare staff and patients in management and monitoring of health cares. This retrospective cohort study aimed to explore factors associated with CD4 cell counts at the time of diagnosis in HIV patients using Poisson, Generalized Poisson, and Negative Binomial regression models. Out of 4402 HIV patients diagnosis in Iran from 1987 to 2016, 3030 (68.8%) were males, and the mean age was 34.8 ± 10.4 years. The results indicate that the Negative Binomial model outperformed the other models in terms of AIC, log-likelihood and RMSE criteria. In this model, factors include sex, age, clinical stage and Tuberculosis (TB) co-infection were significantly associated with CD4 count (P < 0.05). Given the effect of age, sex, clinical stage and stage of HIV on CD4 count of the patients, adopting policies and strategies to increase awareness and encourage people to seek early HIV testing and care is advantageous.


Introduction
AIDS (Acquired Immunodeficiency Syndrome) disease is a threating factor for human life in the world [1][2][3][4][5]. In the beginning of the HIV (Human Immune System Virus) epidemic, about 75 million people infected with the virus worldwide [6,7]. Despite World Health Organization (WHO) has predicted the new cases of HIV and death from AIDS in 2020 will be reduced to 500,000 [8], it will be remained as one of the massive challenges which affects all levels of social, family, and individual activities of mankind [1,9].
Decelerating the progression of HIV to AIDS is a crucial measure to deal with the disease. Accordingly, number of CD4 lymphocytes (CD4) in HIV-infected individuals is a principal indicator of HIV progression and death from AIDS [10][11][12] and that lower CD4 cell count level indicates that the immune system may be compromised [10]. WHO emphasizes on the important role of CD4 counting in assessing the initial status of the disease as well as making appropriate decisions and care for patients with advanced HIV. Therefore, in care systems where access to Antiretroviral Therapy (ART) may be limited, patients with CD4 cell counts less than 350 cells per cubic millimeter are prioritized [11]. There is a correlation between CD4 cell count and risk of death [13], life expectancy [14] and adherence to treatment [15]. The decreased CD4 cells increases the risk of other infectious diseases and clinical symptoms associated with HIV [16]. Specifically, CD4 cell counts less than 200 cells per cubic millimeter is used for description of advanced HIV disease and as the critical threshold for the risk of death [8,13,17,18].
The identification of factors affecting CD4 cell counts at the early stage of treatment for HIV patients may be plays an important role in their care program and their survival [19]. In addition to HIV, other factors can affect the number of CD4 cells. The influence of factors such as infection with other viruses, ethnicity, geographic location, genetics, route of transmission, nutrition, pregnancy, stress, smoking status and drug use on CD4 cell counts have been investigated [1,10,[20][21][22][23][24]. Therefore, modeling and analysis of factors affecting the number of CD4 cells in AIDS research is important. Several generalized linear models are available for modeling the counting data including Poisson and Negative Binomial (NB) as well as Generalized Poisson regression models (GPR) [12,19,[24][25][26]. In recent years, these models have been used extensively in epidemiology and health studies [12,19,[26][27][28].
This study aimed to compare the performance of Poisson, Negative Binomial and GPR models to evaluate the effect of different demographic factors on CD4 count at the time of diagnosis in HIV patients.

Data source
Data from this registry-based retrospective cohort study included information on all newly diagnosed HIV / AIDS patients in 158 Behavioral Disease Counseling Centers in 31 provinces of Iran from 1987 to 2016 [29,30]. Inclusion criteria are as follows: HIV positive, having CD4 cell count at the time of diagnosis (measured up to 3 months after diagnosis), and not receiving ART treatment before determining CD4 cell count.
The following two scenarios were used to evaluate the impact of missing observations on independent variables: firstly to handle the missing data assuming missing at random (MAR) pattern, we applied MI using five imputed data sets. Secondary missing observations in each independent variable were considered in a separate category with unknown name (as reported in the Tables 1 and 2). In estimating factors associated with baseline CD4 count, results were similar in both scenarios. Therefore results of study were reported based on second scenario.
It is necessary to mention that this data are routinely collected from all patients by a registration system. Therefore, consent was not obtained from patients but study protocol was approved by the Ethics Committee of Hamadan University of Medical Sciences with IR.UMSHA.REC.1398. 406.

Statistical models
Poisson distribution is the standard distribution for count data [31]. Let y i be a random variable that indicates the baseline CD4 cells counts (cells/mm 3 ) of ith individual. Assume that y i follows a Poisson distribution with mean θ i which related to a set of predictors. Hence, the probability of observing any specific count y i is given by the following formula [27].
It is assumed that the mean value θ i depends on a set of predictor variables, such that θ i = exp( j x ij β j ) where xi is a covariate vector (age, sex, educational level, …) and β is a vector of unknown regression parameters which should be estimated.
The Poisson distribution assumes that the mean and variance is equal that is referred to as the equidispersion [31]. The phenomenon of large variance relative to the mean is called over-dispersion. This leads to inaccurate estimation of the regression standard errors and the too narrow confidence intervals. In the over-dispersed distribution, a negative binomial regression model can be proposed instead. By assuming θ i to be random variable with gamma distribution by mean E(θ i ) = α i and var(θ i ) = rα i , it can be demonstrated that the marginal distribution of y i follows a negative binomial distribution with mean α i and variance α i 1+rα i where, r denotes the dispersion parameter and If r = 0, indicates that no unobserved heterogeneity which leading to Poisson; and if r > 0, then the variance will be larger than mean and over-dispersion has occurred [27,31].
Another alternative model for data with an over-dispersion event is GPR. Unlike the negative binomial distribution used only in the case of over dispersion, the GPR model can be used for modeling either over-dispersed or over-dispersed counts data sets. The probability density function of the GPR model for the CD4 counts is given by [25,31]: where the parameter r measures is a dispersion in the data, mean α i = exp( j x ij β j ) and variance α i (1 + ry i ) 2 .

Models comparison
Akaike information criterion AIC = − 2logL + k where L and k represent the likelihood function and the number of parameters, respectively was used to compare the models [32]. Lower AIC is considered as a better fit of model. Also, the root mean square error of residual (RMSE) was used for model evaluation. The significance level was set to 0.05. All statistical analyses were done using SPSS 23.0 and Stata14 software.

Results
Descriptive characteristics of the participated based on CD4 count was shown in Table 1. Between 1987 and 2016, data on 4402 patients who have CD4 cell count within three months after diagnosis, as well as no missing information, were included in this study. The mean age of study participants was 34.8 ± 10.4 years and 3030 (68.8%) were male. Most of the participants were married (48.5%) and jobless (39.5%). In addition, most participants were in baseline WHO clinical stage of I (45.4%), and mode of transmission in 47.8% patients was sexual contact. The log-likelihood, AIC and RMSE for the Negative Binomial (NB) model were lower than two other models ( Table 2). The estimate of r for NB and Generalized Poisson was 0.68 (S.E. 0.01) and 0.95 (S.E. 0.001) respectively which were significantly different from null value 0 (P < 0.001), therefore, NB and generalized Poisson models are favored over the Poisson. These results imply that the Negative Binomial model outperformed the Poisson and Generalized Poisson models for the used data set.
Output of NB model is presented in Table 3. Male gender was negatively associated with CD4 count. All the

Discussion
This study aimed to explore factors affecting baseline CD4 counts using Poisson, Negative Binomial and GPR regressions in HIV patients. In this study, we found that the Negative Binomial regression yielded the best fit. In addition, we found that CD4 counts decrease with increasing age group. The results of Tang et al. in [33], indicate that older patients are more likely to be diagnosed late based on CD4 counts. One of the reasons for late diagnosis in the elderly can be attributed to low education and information, as well as the low-risk perception of this disease in these people. Although it should be noted that the observed associations between CD4 cell count and subject characteristics at the time of HIV diagnosis will not be far from confounding factors such as the age of the PLHIV: patients who have been infected for a long time and get their diagnosis relatively late will have a relatively low CD4 cell count level but will also tend to be older. The result of this study showed male gender was negatively associated with CD4 count. This finding approved by Mair et al. (2007) in Senegalese patients infected with other infections [10].
In the year 2016, Bruneau et al. used multiple quintile regression methods to identify the factors affecting the number of CD4 cells in France. The results of this study showed that higher age, male gender, external migrant patients, co-infection with hepatitis B and C viruses, rural residency and homosexual transmission method have a negative association with CD4 cell count. However, the  statistical significance of these factors varied in different quantiles of CD4 [23]. Factors that affect the CD4 cell count in persons living with HIV (PLHIV) may indeed inform health care interventions or policies. However, establishing associations between CD4 cell count and risk factors based on cohort data will not directly lead to designing new health care interventions. Identifying these people and providing proper and adequate education in order to use prevention methods, change high-risk behavior or leave it can be an effective factor in preventing HIV infection in this group and ultimately its prevalence in society.

Conclusions
One of the main priorities of public health in the field of HIV is to reduce the number of people who are treated with a lower CD4 cells counts at the time of diagnosis as a marker of HIV progression. The results of the present study confirm that certain groups of people such as older, men, people who inject drugs, people with unsafe sex and TB co-infection had lower initial CD4 counts. Therefore, awareness of such high-risk subsets for late detection can help guide policies and strategies to increase awareness and encourage people to early seek HIV testing and care.

Limitation
We used registered data, some key variables had not been measured, and therefore we couldn't either assess their effects on CD4 cell counts and or control their confounding effects. Many changes, such as demographic changes in cohort participants, changes in treatment policy, quality of care, HIV testing policies and promotions, as well as general public awareness and stigma towards PLHIV, may have emerged during the long time period of study. All these factors may be confounded with collected subject characteristics as well as the CD4 cell counts observed at HIV diagnosis. Moreover, all regression models used in this study are based on the mean response, while other regression such as quantile regressions can provide a more comprehensive analysis of the factors associated with CD4 cell counting.