Skip to main content

This lesson reviews other regression models.  At the end of this lesson, you will be able to:

  1. Determine when to use a logistic regression model
  2. Interpret the results of a logistic regression model
  3. Describe the purpose of a multilevel model
  4. Describe the purpose of a survival analysis

Terms that appear frequently throughout this lesson are defined below:

Term Definition
Logistic regression A regression model that measures the relationship between a categorical dependent variable and one or more independent variables
Multilevel data Data that represents more than one level, in which the lower level is nested within a higher level
Multilevel model Statistical approach to analyzing data that represents multiple levels
Survival analysis A statistical analysis that models time to event data

I. Logistic Regression

Also called logit regression or logit model

A logistic regression model is used for predicting the outcome of a categorical dependent variable based on one or more predictor variables. Most commonly, the dependent variable is binary (e.g., adherent vs. non-adherent), although multinomial logistic regression allows for an outcome with three or more possible values (e.g., disease 1, disease 2, disease 3).

  • Independent variables can be categorical or continuous.
  • Unlike one-way ANOVA and Student’s t-test, logistic regression can predict the probability of the outcome variable.
  • Unlike linear regression, in which the sample data is fit to a line, logistic regression fits data to a logistic curve constrained to the range between 0 (e.g., adherent) and 1 (e.g., non-adherent).
  • Unlike linear regression, which predicts an outcome based on one or more independent variables, logistic regression predicts the probability of the outcome based on one or more independent variables.

Examples of Fitting the Logistic Curve to Sample Data
Two charts with differing levels of fit between the logistic curve, and sample data.

Click here for a detailed description.

  • Results for independent variables in a logistic regression model are presented as odds ratios (see Lesson 5).
  • There are several different ways of estimating the p value. The Wald chi-square is fairly popular, but it may yield inaccurate results with small sample sizes in which case the likelihood ratio method may be better.
  • The goodness-of-fit for a logistic regression model can be assessed using pseudo R2 values

II. Multilevel Models

Also called hierarchical linear models, nested models, mixed models, random coefficient models, or random-effects models

Linear regression and logistic regression are basic single-level models. Sometimes data in a sample represents more than one level of analysis. For example, in a national study, people (i.e., level 1) may be nested within hospitals (i.e., level 2), which are nested within states (i.e., level 3) and all three levels may have characteristics that should be included in the model:

Graphic showing the relationships of nested data. People (level 1) are nested within hospitals (level 2) that are nested within states (level 3).

Click here for a detailed description.

Attaching higher-level characteristics to an individual or group violates the assumption that observations are independent of one another. Multilevel models should be used for nested data structures in which characteristics of more than one level are believed to be important.

III. Survival Analysis

Also called event history analysis, duration analysis, or duration modeling

Survival analysis examines time-to-event data, or the time duration until one or more events happen. The event can be death, occurrence of a disease, relapse, divorce, etc and can be measured in days, weeks, years, etc. In this type of study, subjects are followed over a specified time period with special attention to the time at which the event of interest occurs.

  • Cox proportional hazards regression model: allows test for differences in survival times of two or more groups of interest
  • Kaplan Meier method: a nonparametric estimator of the survival function

Example I. Logistic Regression:
Adherence in a Pharmacy Assistance Program

Methods:

We used a retrospective cohort design to investigate six-month outcomes for participants in the University of North Carolina (UNC) Health Care Pharmacy Assistance Program (PAP) who received medications indicated for hypertension, diabetes, and/or hyperlipidemia from 2009 through 2011. The three study cohorts included 866 patients receiving antihypertensive agents, 265 patients receiving oral glucose-lowering agents, and 455 patients receiving statins. Multivariable logistic regression was used to examine the impact of the predefined covariates on medication adherence (primary outcome).

Adjusted Multivariable Logistic Regression Results Predicting Adherencea Among Newly Enrolled Participants in the UNC Health Care Pharmacy Assistance Program to Medications for a Specific Chronic Disease

Characteristic / Variable Predicting Adherencea Users of Antihypertensive Agents (N = 866) OR (95% CI) Users of Oral Glucose-Lowering Agents (N = 265) OR (95% CI) Users of Statins (N = 455) OR (95% CI)
Age 1.03** (1.01 – 1.04) 1.02 (0.99 – 1.05) 1.04* (1.00 – 1.05)
Female sex 0.84 (0.63 – 1.13) 0.70 (0.42 – 1.18) 0.74 (0.49 – 1.10)
White race 0.89 (0.65 – 1.22) 1.83* (1.05 – 3.21) 1.82** (1.19 – 2.79)
English as preferred language 0.74 (0.48 – 1.12) 1.07 (0.53 – 2.14) 1.21 (0.64 – 2.32)
Local residenceb 1.31 (0.96 – 1.79) 1.37 (0.76 – 2.47) 1.23 (0.79 – 1.92)
No. of unique drugs received 1.17** (1.13 – 1.22 1.05 (0.99 – 1.11) 1.06** (1.02 – 1.11)
Use of any hypertensive agent 1.17 (0.64 – 2.13) 1.36 (0.85 – 2.16)
Use of any glucose-lowering agent 1.20 (0.81 – 1.79) 1.37 (0.89 – 2.12)
Use of any statin 1.80** (1.30 – 2.49) 1.75 (0.99 – 3.10)

CI = confidence interval
OR = odds ratio
UNC = University of North Carolina
*P < .05
**P < .01
aAdherence was defined as having an overall, aggregate proportion of days covered (PDC) equal to or greater than 0.8 for medications within a cohort.
bLocal residence was defined as living in one of these seven counties: Orange, Chatham, Alamance, Caswell, Person, Durham, or Wake.

Results:

When all covariates were included, older age was a statistically significant predictor of adherence to antihypertensive agents (OR = 1.03; 95% CI, 1.01 – 1.04) and adherence to statins (OR = 1.03; 95% CI, 1.00 – 1.05). Similarly, the number of unique medications for which prescriptions were filled also had a statistically significant positive association with adherence to antihypertensive agents and with adherence to statins. White race was associated with 83% greater odds of adherence to oral glucose-lowering agents (OR = 1.83; 95% CI, 1.05 – 3.21) and 82% greater odds of adherence to statins (OR = 1.82; 95% CI, 1.19 – 2.79). For patients in the antihypertensive cohort, concomitant use of statins significantly increased the odds of adherence to antihypertensive agents. Patient sex, language preference, and local residence were not associated with adherence to medications for any of the 3 cohorts.


Roberts AW, Crisp GD, Esserman DA, Roth MT, Weinberger M, Farley JF. Patterns of Medication Adherence and Health Care Utilization Among Patients With Chronic Disease Who Were Enrolled in a Pharmacy Assistance Program. NC Med J. 2014; 75(5): 310-318.

Example II. Multilevel Model:

Variance in individual health status attributable to the family

Methods:

Secondary data were used from the Community Tracking Study. Participants were US residents aged 18 years and older who shared a household with family members in the study (N = 35,055). Main outcome measures were the Short Form-12 (SF-12) self-reported physical subscales. Hierarchical linear modeling was used to estimate the individual and family components of health status. The setting was 60 US communities, which account for approximately one half of the population. Our initial analysis used the combined level-3 model to partition the variance of the SF-12 scores into 3 components: individual (level-1), family (level-2), and community (level-3). Because the community level accounted for less than 1% of the total variance in health status scores in initial analyses, however, subsequent analyses were limited to the individual (i.e., level-1) and family (i.e., level-2) components. The second analysis was a series of multilevel regression equations that sequentially added age, family income, and then health insurance status as predictors of SF-12 scores. For this set of equations, we were interested in assessing the proportion of family-level variance accounted for as each covariate was added to the model.

Table 2: Multilevel Variance Components for SF-12 Physical Health Summary Score

Individual Family
Family Composition Level-1 Var SE % Level-2 Var SE %
Single-family households
Married, no kids 88.48 4.43 77.7 25.43 3.72 22.3
Married with kids 55.14 3.11 86.8 8.44 2.00 13.2
Single with kids 77.59 11.13 76.9 23.31 9.38 23.1
Multiple-family households
Married, no kids 103.05 11.50 83.9 19.85 8.65 16.1
Married with kids 70.66 9.20 83.9 13.53 6.33 16.1
Single with kids 75.93 14.64 95.5 3.64 9.16 4.5*

Var = variance
SE = standard error
SF-12 = short form 12
*Not significant

Table 4: Multilevel Regression Parameters (Regression Weight and SE) for SF-12 Physical Component Summary Scores in Single Family Households

Family Composition Intercept Age Years Income* Insurance Status† Level-2 Variance, % –2*logL‡
Married, no kids
Model 1 48.05 (0.23) 22.3 89,725
Model 2 58.11 (0.63) –0.18 (0.02) 16.2 88,942
Model 3 50.93 (1.10) –0.15 (0.02) 1.65 (0.15) 12.5 88,510
Model 4 48.75 (1.30) –0.13 (0.02) 1.47 (0.20) 1.25 (0.42) 12.6 88,452
Married with kids
Model 1 51.82 (0.16) 13.2 105,957
Model 2 54.62 (0.62) –0.08 (0.02) 13.5 105,851
Model 3 51.06 (0.74) –0.10 (0.02) 1.28 (0.16) 10.1 105,398
Model 4 50.65 (0.80) –0.10 (0.02) 1.19 (0.16) 0.44 (0.14) 9.9 105,376
Single with kids
Model 1 49.62 (0.54) 23.31 16,289
Model 2 53.91 (1.72) –0.13 (0.06) 22.05 16,257
Model 3 50.80 (1.73) –0.17 (0.06) 2.14 (0.42) 8.8§ 16,141
Model 4 50.51 (1.88) –0.017 (0.06) 2.03 (0.46) 0.39 (1.62)§ 8.6§ 16,137

SF-12 = short form 12
*Income quintile: 1 = lowest; 5 = highest
†Insured
‡-2*LogL is a goodness-of-fit statistic. Smaller numbers indicate a better model fit.
§Not significant

Results:

The family (i.e., level-2) variance component ranged from 4.5% to 26.1% for the physical health score. All the level-2 variance components for physical health were statistically significant except for single persons with children in multiple-family households. As seen in Table 4, age and income were significant predictors of physical health status in all family configurations. The effects were in the expected direction; older age, lower income, and lack of insurance were associated with worse physical health status. Age accounted for approximately 30% of the level-2 variance for physical health status in the “married, no kids” group, reducing the family-level variance component from 22.3% to 16.2% of the total variance. This magnitude of effect was not observed in the “married with kids” or the “single with kids” groups. Adding income to the regression equations further reduced the level-2 variance component by 23% to 60% in all family configurations. After adjustment for age and income, insurance status only slightly improved the model.


Ferrer RL, Palmer R, Burge S. The family contribution to health status: a population-level estimate. Ann Fam Med. 2005; 3 (2): 102-108.

Example III. Survival Analysis:
Statin Treatment and 1-year Survival

Methods:

Prospective cohort study using data from the Swedish Register of Cardiac Intensive Care on patients admitted to the coronary care units of 58 Swedish hospitals in 1995 – 1998. Participants included patients with first registry-recorded AMI who were younger than 80 years and who were discharged alive from the hospital, including 5,528 who received statins at or before discharge and 14,071 who did not.

Main Outcome Measure Relative risk of 1-year mortality according to statin treatment.

Comparisons between different patient strata and different categories of hospitals were analyzed by χ2 tests for categorical variables and by the t-test for continuous variables. Bivariate analyses and multiple covariate Cox regression analyses were used to identify any variable with a significant influence on mortality. The analyses were also performed for 30-, 60-, and 90-day survivors to allow for even longer periods of early mortality in patients who could have been perceived to have a too-short life expectancy to benefit from statin treatment.

Figure. Adjusted Probability of Mortality by Statin Treatment:

Figure showing the adjusted probability of mortality by statin treatment.

Click here for a detailed description.

Results:

Among the 14,071 patients without statin treatment, the unadjusted 1-year mortality was 9.3% (n = 1,307) compared with 4.0% (n = 219) among the 5,528 patients with statin treatment. In Cox regression analysis, adjusting for the 43 covariates, statin treatment at discharge was associated with a reduction in 1-year mortality (3.7% vs 5.0%; relative risk [RR], 0.75; 95% confidence interval [CI], 0.63 – 0.89; p = .001)


Stenestrand U, Wallentin L, Swedish Register of Cardiac Intensive Care (RIKS-HIA). Early statin treatment following acute myocardial infarction and 1-year survival. Jama. 2001;285(4): 430-436.