Skip to main content

Factors related to baseline CD4 cell counts in HIV/AIDS patients: comparison of poisson, generalized poisson and negative binomial regression models



CD4 Lymphocyte Count (CD4) is a major predictor of HIV progression to AIDS. Exploring the factors affecting CD4 levels may assist healthcare staff and patients in management and monitoring of health cares. This retrospective cohort study aimed to explore factors associated with CD4 cell counts at the time of diagnosis in HIV patients using Poisson, Generalized Poisson, and Negative Binomial regression models.


Out of 4402 HIV patients diagnosis in Iran from 1987 to 2016, 3030 (68.8%) were males, and the mean age was 34.8 ± 10.4 years. The results indicate that the Negative Binomial model outperformed the other models in terms of AIC, log-likelihood and RMSE criteria. In this model, factors include sex, age, clinical stage and Tuberculosis (TB) co-infection were significantly associated with CD4 count (P < 0.05).


Given the effect of age, sex, clinical stage and stage of HIV on CD4 count of the patients, adopting policies and strategies to increase awareness and encourage people to seek early HIV testing and care is advantageous.


AIDS (Acquired Immunodeficiency Syndrome) disease is a threating factor for human life in the world [1,2,3,4,5]. In the beginning of the HIV (Human Immune System Virus) epidemic, about 75 million people infected with the virus worldwide [6, 7]. Despite World Health Organization (WHO) has predicted the new cases of HIV and death from AIDS in 2020 will be reduced to 500,000 [8], it will be remained as one of the massive challenges which affects all levels of social, family, and individual activities of mankind [1, 9].

Decelerating the progression of HIV to AIDS is a crucial measure to deal with the disease. Accordingly, number of CD4 lymphocytes (CD4) in HIV-infected individuals is a principal indicator of HIV progression and death from AIDS [10,11,12] and that lower CD4 cell count level indicates that the immune system may be compromised [10]. WHO emphasizes on the important role of CD4 counting in assessing the initial status of the disease as well as making appropriate decisions and care for patients with advanced HIV. Therefore, in care systems where access to Antiretroviral Therapy (ART) may be limited, patients with CD4 cell counts less than 350 cells per cubic millimeter are prioritized [11]. There is a correlation between CD4 cell count and risk of death [13], life expectancy [14] and adherence to treatment [15]. The decreased CD4 cells increases the risk of other infectious diseases and clinical symptoms associated with HIV [16]. Specifically, CD4 cell counts less than 200 cells per cubic millimeter is used for description of advanced HIV disease and as the critical threshold for the risk of death [8, 13, 17, 18].

The identification of factors affecting CD4 cell counts at the early stage of treatment for HIV patients may be plays an important role in their care program and their survival [19]. In addition to HIV, other factors can affect the number of CD4 cells. The influence of factors such as infection with other viruses, ethnicity, geographic location, genetics, route of transmission, nutrition, pregnancy, stress, smoking status and drug use on CD4 cell counts have been investigated [1, 10, 20,21,22,23,24]. Therefore, modeling and analysis of factors affecting the number of CD4 cells in AIDS research is important. Several generalized linear models are available for modeling the counting data including Poisson and Negative Binomial (NB) as well as Generalized Poisson regression models (GPR) [12, 19, 24,25,26]. In recent years, these models have been used extensively in epidemiology and health studies [12, 19, 26,27,28].

This study aimed to compare the performance of Poisson, Negative Binomial and GPR models to evaluate the effect of different demographic factors on CD4 count at the time of diagnosis in HIV patients.

Main text

Data source

Data from this registry-based retrospective cohort study included information on all newly diagnosed HIV / AIDS patients in 158 Behavioral Disease Counseling Centers in 31 provinces of Iran from 1987 to 2016 [29, 30]. Inclusion criteria are as follows: HIV positive, having CD4 cell count at the time of diagnosis (measured up to 3 months after diagnosis), and not receiving ART treatment before determining CD4 cell count.

The initial CD4 cell count was considered as a response variable. The potential predictor variables for this study were age group (< 30,30–40,40–50, > 50), gender (male, female), educational status (illiterate/primary school, secondary school, high school, academic, unknown), material status (single, married, widow, divorce, unknown), job (employed, unemployed, unknown), transmission way (injecting drug users, unsafe sexual, mother to child, blood transfusion and unknown), year of diagnostic (before 2006, 2006–2011, after 2011), TB co-infection (yes, no), WHO clinical stage disease (stage I, stage II, stage III, stage IV, unknown).

The following two scenarios were used to evaluate the impact of missing observations on independent variables: firstly to handle the missing data assuming missing at random (MAR) pattern, we applied MI using five imputed data sets. Secondary missing observations in each independent variable were considered in a separate category with unknown name (as reported in the Tables 1 and 2). In estimating factors associated with baseline CD4 count, results were similar in both scenarios. Therefore results of study were reported based on second scenario.

Table 1 General characters of the participated based on the CD4 count
Table 2 Comparison of three models in terms of Log likelihood and AIC Statistic

It is necessary to mention that this data are routinely collected from all patients by a registration system. Therefore, consent was not obtained from patients but study protocol was approved by the Ethics Committee of Hamadan University of Medical Sciences with IR.UMSHA.REC.1398. 406.


Statistical models

Poisson distribution is the standard distribution for count data [31]. Let yi be a random variable that indicates the baseline CD4 cells counts (cells/mm3) of ith individual. Assume that yi follows a Poisson distribution with mean θi which related to a set of predictors. Hence, the probability of observing any specific count yi is given by the following formula [27].

$$f(y_{i} ;\theta_{i} ) = \frac{{\theta_{i}^{{y_{i} }} e^{ - \theta i} }}{{y_{i} !}}, \quad y_{i} = 0,1,2, \ldots$$

It is assumed that the mean value θi depends on a set of predictor variables, such that \(\theta_{i} = \exp ( {\sum\nolimits_{j} {x_{ij} \beta_{j} } } )\) where xi is a covariate vector (age, sex, educational level, …) and β is a vector of unknown regression parameters which should be estimated.

The Poisson distribution assumes that the mean and variance is equal that is referred to as the equidispersion [31]. The phenomenon of large variance relative to the mean is called over-dispersion. This leads to inaccurate estimation of the regression standard errors and the too narrow confidence intervals. In the over-dispersed distribution, a negative binomial regression model can be proposed instead. By assuming θi to be random variable with gamma distribution by mean \(E(\theta_{i} ) = \alpha_{i}\) and \({\text{var}} (\theta_{i} ) = r\alpha_{i}\), it can be demonstrated that the marginal distribution of yi follows a negative binomial distribution with mean αi and variance \(\frac{{\alpha_{i} }}{{1 + r\alpha_{i} }}\) where, r denotes the dispersion parameter and \(\alpha_{i} = \exp ( {\sum\nolimits_{{}} {x_{ij} \beta_{j} } } )\) .

If r = 0, indicates that no unobserved heterogeneity which leading to Poisson; and if r > 0, then the variance will be larger than mean and over-dispersion has occurred [27, 31].

Another alternative model for data with an over- dispersion event is GPR. Unlike the negative binomial distribution used only in the case of over dispersion, the GPR model can be used for modeling either over-dispersed or over-dispersed counts data sets. The probability density function of the GPR model for the CD4 counts is given by [25, 31]:

$$f(y_{i} ;\theta_{i} ) = \frac{{\alpha_{i} }}{{1 + r\alpha_{i} }}\left( {\frac{{\alpha_{i} \left( {1 + ry_{i} } \right)}}{{1 + ry_{i} }}} \right)^{{y_{i} - 1}} \exp \left( { - \frac{{\alpha_{i} \left( {1 + ry_{i} } \right)}}{{1 + ry_{i} }}} \right)\frac{1}{y!}, \quad y_{i} = 0,1,2,...$$

where the parameter r measures is a dispersion in the data, mean \(\alpha_{i} = \exp ( {\sum\nolimits_{j} {} x_{ij} \beta_{j} } )\) and variance \(\alpha_{i} (1 + ry_{i} )^{2}\) .

Models comparison

Akaike information criterion AIC = − 2logL + k where L and k represent the likelihood function and the number of parameters, respectively was used to compare the models [32]. Lower AIC is considered as a better fit of model. Also, the root mean square error of residual (RMSE) was used for model evaluation. The significance level was set to 0.05. All statistical analyses were done using SPSS 23.0 and Stata14 software.


Descriptive characteristics of the participated based on CD4 count was shown in Table 1. Between 1987 and 2016, data on 4402 patients who have CD4 cell count within three months after diagnosis, as well as no missing information, were included in this study. The mean age of study participants was 34.8 ± 10.4 years and 3030 (68.8%) were male. Most of the participants were married (48.5%) and jobless (39.5%). In addition, most participants were in baseline WHO clinical stage of I (45.4%), and mode of transmission in 47.8% patients was sexual contact.

The log-likelihood, AIC and RMSE for the Negative Binomial (NB) model were lower than two other models (Table 2). The estimate of r for NB and Generalized Poisson was 0.68 (S.E. 0.01) and 0.95 (S.E. 0.001) respectively which were significantly different from null value 0 (P < 0.001), therefore, NB and generalized Poisson models are favored over the Poisson. These results imply that the Negative Binomial model outperformed the Poisson and Generalized Poisson models for the used data set.

Output of NB model is presented in Table 3. Male gender was negatively associated with CD4 count. All the age group of the diagnosis has a significant effect. For all age groups, the CD4 counts decrease with increasing age group (P < 0.05). Marital status, Vocation status, and education level had not significant association with CD4 counts (P > 0.05). The transmission way of AIDS has a significant effect on CD4 counts. The patients infected by inject drug user, unsafe sex and blood transition had significantly lower CD4 counts than mother to child transition people. The patients with WHO clinical stage (II, III, IV and unknown) had significantly lower CD4 counts than the people with WHO clinical stage of I. Year of HIV diagnosis had a significant effect on the CD4 counts. The patients with year of HIV diagnosis before 2006 had significantly larger CD4 counts than people who their year of HIV diagnosis was after 2006. TB co-infection had a negative significant effect on CD4 counts (P < 0.05).

Table 3 Negative Binomial regression coefficients for factors affecting the initial number of CD4 cell counts


This study aimed to explore factors affecting baseline CD4 counts using Poisson, Negative Binomial and GPR regressions in HIV patients. In this study, we found that the Negative Binomial regression yielded the best fit. In addition, we found that CD4 counts decrease with increasing age group. The results of Tang et al. in [33], indicate that older patients are more likely to be diagnosed late based on CD4 counts. One of the reasons for late diagnosis in the elderly can be attributed to low education and information, as well as the low-risk perception of this disease in these people. Although it should be noted that the observed associations between CD4 cell count and subject characteristics at the time of HIV diagnosis will not be far from confounding factors such as the age of the PLHIV: patients who have been infected for a long time and get their diagnosis relatively late will have a relatively low CD4 cell count level but will also tend to be older.

The result of this study showed male gender was negatively associated with CD4 count. This finding approved by Mair et al. (2007) in Senegalese patients infected with other infections [10].

In the year 2016, Bruneau et al. used multiple quintile regression methods to identify the factors affecting the number of CD4 cells in France. The results of this study showed that higher age, male gender, external migrant patients, co-infection with hepatitis B and C viruses, rural residency and homosexual transmission method have a negative association with CD4 cell count. However, the statistical significance of these factors varied in different quantiles of CD4 [23].

Factors that affect the CD4 cell count in persons living with HIV (PLHIV) may indeed inform health care interventions or policies. However, establishing associations between CD4 cell count and risk factors based on cohort data will not directly lead to designing new health care interventions. Identifying these people and providing proper and adequate education in order to use prevention methods, change high-risk behavior or leave it can be an effective factor in preventing HIV infection in this group and ultimately its prevalence in society.


One of the main priorities of public health in the field of HIV is to reduce the number of people who are treated with a lower CD4 cells counts at the time of diagnosis as a marker of HIV progression. The results of the present study confirm that certain groups of people such as older, men, people who inject drugs, people with unsafe sex and TB co-infection had lower initial CD4 counts. Therefore, awareness of such high-risk subsets for late detection can help guide policies and strategies to increase awareness and encourage people to early seek HIV testing and care.


We used registered data, some key variables had not been measured, and therefore we couldn’t either assess their effects on CD4 cell counts and or control their confounding effects. Many changes, such as demographic changes in cohort participants, changes in treatment policy, quality of care, HIV testing policies and promotions, as well as general public awareness and stigma towards PLHIV, may have emerged during the long time period of study. All these factors may be confounded with collected subject characteristics as well as the CD4 cell counts observed at HIV diagnosis. Moreover, all regression models used in this study are based on the mean response, while other regression such as quantile regressions can provide a more comprehensive analysis of the factors associated with CD4 cell counting.

Availability of data and materials

The datasets analyzed during the current study available from the corresponding author on reasonable request.



Human immunodeficiency viruses


Acquired immunodeficiency syndrome


Negative Binomial


Persons living with HIV


Generalized Poisson regression


  1. Abbastabar HH, Rezaianzadeh A, Rajaeefard AR, Ghaem H, Motamedifar M, Afsar KP. Determining factors of CD4 cell count in HIV patients: in a historical cohort. IJLPR SP. 2016;1:93–101.

    Google Scholar 

  2. CDC. About HIV/AIDS 2019 (updated 2019).

  3. Sharp PM, Hahn BH. Origins of HIV and the AIDS pandemic. Cold Spring Harb Perspect Med. 2011;1(1):a006841.

    Article  Google Scholar 

  4. SB M. Environmental and occupational medicine 2007. Wolters Kluwer/Lippincott Williams & Wilkins.

  5. WHO. Global health risks-mortality and burden of disease attributable to selected major risks: the Lancet; 2015.

  6. UNAIDS. Global Report Fact Sheet 2012. Available from:

  7. WHO. 2014.


  9. Greener R. AIDS a macroeconomic impact. In: Forsyth S, edn. State of the art: AIDS and economics. IAEN 2002; pp. 49–55.

  10. Mair C, Hawes SE, Agne HD, Sow PS, N’Doye I, Manhart LE, et al. Factors associated with CD4 lymphocyte counts in HIV-negative Senegalese individuals. Clin Exp Immunol. 2008;151(3):432–40.

    Article  CAS  Google Scholar 

  11. Ford N, Meintjes G, Vitoria M, Greene G, Chiller T. The evolving role of CD4 cell counts in HIV care. Curr Opin HIV AIDS. 2017;12(2):123–8.

    Article  Google Scholar 

  12. Temesgen A. Application of poisson mixed combined models for identifying correlations of CD4 count progression in HIV infected TB patients during ART treatment period. Int J Stat Probability. 2017;6(5):42–52.

    Article  Google Scholar 

  13. Hogg RS, Yip B, Chan KJ, Wood E, Craib KJ, O’Shaughnessy MV, et al. Rates of disease progression by baseline CD4 cell count and viral load after initiating triple-drug therapy. JAMA. 2001;286(20):2568–77.

    Article  CAS  Google Scholar 

  14. Johnson LF, Mossong J, Dorrington RE, Schomaker M, Hoffmann CJ, Keiser O, et al. Life expectancies of South African adults starting antiretroviral treatment: collaborative analysis of cohort studies. PLoS Med. 2013;10(4):e1001418.

    Article  Google Scholar 

  15. Bock P, James A, Nikuze A, Peton N, Sabapathy K, Mills E, et al. Baseline CD4 count and adherence to antiretroviral therapy: a systematic review and meta-analysis. JAIDS. 2016;73(5):514–21.

    CAS  PubMed  Google Scholar 

  16. Langford SE, Ananworanich J, Cooper DA. Predictors of disease progression in HIV infection: a review. AIDS Res Treat. 2007;4(1):11.

    Article  Google Scholar 

  17. Brennan AT, Long L, Useem J, Garrison L, Fox MP. Mortality in the first 3 months on antiretroviral therapy among HIV-positive adults in low-and middle-income countries: a meta-analysis. JAIDS. 2016;73(1):1–10.

    CAS  PubMed  Google Scholar 

  18. Waldrop G, Doherty M, Vitoria M, Ford N. Stable patients and patients with advanced disease: consensus definitions to support sustained scale up of antiretroviral therapy. Trop Med Int Health. 2016;21(9):1124–30.

    Article  Google Scholar 

  19. Seyoum A, Zewotir T. Quasi-Poisson versus negative binomial regression models in identifying factors affecting initial CD4 cell count change due to antiretroviral therapy administered to HIV-positive adults in north–West Ethiopia (Amhara region). AIDS Res Ther. 2016;13(1):36.

    Article  Google Scholar 

  20. Jaén Á, Esteve A, Miró JM, Tural C, Montoliu A, Ferrer E, et al. Determinants of HIV progression and assessment of the optimal time to initiate highly active antiretroviral therapy: PISCIS Cohort (Spain). JAIDS. 2008;47(2):212–20.

    PubMed  Google Scholar 

  21. Montarroyos UR, Miranda-Filho DB, Cesar CC, Souza WV, Lacerda HR, Albuquerque MDFPM, et al. Factors related to changes in CD4+ T-cell counts over time in patients living with HIV/AIDS: a multilevel analysis. PLoS One. 2014;9(2):84276.

    Article  Google Scholar 

  22. Akinbami AA, Gbadegesin A, Ajibola SO, Uche EI, Dosunmu AO, Adediran A, et al. Factors influencing CD4 cell count in HIV-positive pregnant women in a secondary health center in Lagos, Nigeria. HIV/AIDS (Auckland, NZ). 2015;7:115.

    Google Scholar 

  23. Bruneau L, Billaud E, Raffi F, Hanf M. Factors associated with the level of CD4 cell counts at HIV diagnosis in a French cohort: a quantile regression approach. INT J STD AIDS. 2017;28(4):397–403.

    Article  Google Scholar 

  24. McCullagh P, Nelder JA. Generalized linear models. 2nd ed. London: Chapman and Hall; 1989.

    Book  Google Scholar 

  25. Agresti A. An introduction to categorical data analysis. New York: Wiley; 1996.

    Google Scholar 

  26. Grover G, Vajala R, Swain PK. On the assessment of various factors effecting the improvement in CD4 count of aids patients undergoing antiretroviral therapy using generalized Poisson regression. J Appl Stat. 2015;42(6):1291–305.

    Article  Google Scholar 

  27. Famoye F, Wulu JT, Singh KP. On the generalized Poisson regression model with an application to accident data. J Data Sci. 2004;2004(2):287–95.

    Google Scholar 

  28. Djalalinia S, Moghaddam SS, Peykari N, Kasaeian A, Sheidaei A, Mansouri A, Mohammadi Y, Parsaeian M, Mehdipour P, Larijani B, Farzadfar F. Mortality attributable to excess body mass index in iran: implementation of the comparative risk assessment methodology. Int J Prev Med. 2015;4(6):107.;PMCID:PMC4671178.

    Article  Google Scholar 

  29. Mirzaei M, Farhadian M, Poorolajal J, Afsar Kazerooni P, Tayeri K, Mohammadi Y. Survival rate and the determinants of progression from HIV to AIDS and from AIDS to the death in Iran: 1987 to 2016. Asian Pac J Trop Med. 2019;12(2):72.

    Article  Google Scholar 

  30. Mirzaei M, Farhadian M, Poorolajal J, Afasr Kazerooni P, Tayeri K, Mohammadi Y. Life expectancy of HIV-positive patients after diagnosis in Iran from 1986 to 2016: a retrospective cohort study at national and sub-national levels. Epidemiol Health. 2018;40:e2018053.

    Article  Google Scholar 

  31. Harris T, Yang Z, Hardin JW. Modeling underdispersed count data with generalized Poisson regression. Stata J. 2012;12(4):736–47.

    Article  Google Scholar 

  32. Akaike H. Information theory and an extension of the maximum likelihood principle. Selected papers of hirotugu akaike: Springer; 1998. p. 199–213.

    Google Scholar 

  33. Tang H, Mao Y, Shi CX, Han J, Wang L, Xu J, et al. Baseline CD4 cell counts of newly diagnosed HIV cases in China: 2006–2012. PLoS One. 2014;9(6):e96098.

    Article  Google Scholar 

Download references


The authors wish to express their sincere gratitude to Vice Chancellor of Research of Hamadan University of Medical Sciences for financial support (Grant No. 9806264725).


This research was founded and supported by a Grant (No. 9806264725) from Hamadan University of Medical Sciences.

Author information

Authors and Affiliations



MF prepares proposal, set and analysis the results of the study and their interpretation, prepares and interprets data, prepares a final report, prepares results, writing the article. YM supervised the design and execution of the study and revising the article. MM contributed to preparation of the proposal, collected the data and revising the article. NS contributed to preparation of the proposal, collected the data and revising the article. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Nasrin Shirmohammadi-Khorram.

Ethics declarations

Ethics approval and consent to participate

This study was approved by the Ethics Committee of Hamadan University of Medical Sciences with IR.UMSHA.REC.1398.406. Registered data was used, therefore informed consent not applicable.

Consent for publication

Not applicable.

Competing interests

There are no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Farhadian, M., Mohammadi, Y., Mirzaei, M. et al. Factors related to baseline CD4 cell counts in HIV/AIDS patients: comparison of poisson, generalized poisson and negative binomial regression models. BMC Res Notes 14, 114 (2021).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: