What Statistics Are Right for My Project - Advanced
This section is intended to be a quick reference for a selection of advanced statistical models with a single outcome and multiple explanatory factors. The importance of moving beyond looking at each explanatory factor’s relationship to the outcome separately is that you can evaluate the independent effect of each factor and reduce the variability in the outcome. All statistical models rely on mathematical assumptions which should be evaluated prior to implementing and interpreting the model.
Consult with a statistician when planning or analyzing an advanced statistical model. This section utilizes several definitions that can be found here.
Contact NEDARC to consult with a statistician.
Advanced Topics:
Linear Regression
- Continuous outcome
- Independent observations (no repeated measures or clustering)
- Objective: to evaluate the effect of continuous and/or categorical explanatory variables on the outcome
- Interpretation: For categorical variables such as sex, you estimate a difference in mean response (outcome). For continuous variables such as age, the change in the response for every one unit change in the variable is estimated. Because you are evaluating all variables in the same model, each effect estimate is interpreted within the context of the other variables (holding all else constant).
- Also obtain R2, an estimate of the amount of variability in your response that is accounted for by the model.
- Analysis of variance (ANOVA) and analysis of covariance (ANCOVA) models can fit into this framework.

Logistic Regression
- Yes/No (two-level) outcome
- Independent observations (no repeated measures or clustering)
- Objective: to evaluate the effect of continuous and/or categorical explanatory variables on the probability of a “yes” outcome
- Interpretation: For categorical variables such as sex, you estimate the odds of a yes outcome for female vs. a yes outcome for male (referred to as an odds ratio -- in some cases can be interpreted as relative risk). For continuous variables such as age, the odds ratio is for a one-unit increase in the explanatory variable.

Cumulative Logistic Regression
- Similar to logistic regression but modeling an ordered categorical outcome with more than 2 levels
- Independent observations (no repeated measures or clustering)
- Objective: to evaluate the effect of continuous and/or categorical explanatory variables on the probability of level 2 outcome vs. level 1 outcome and level 3 outcome vs. level 2 outcome.
- Interpretation: Odds ratios again – similar to logistic regression but now reflecting a one-level increase in the outcome.

Multinomial Logit Model
- Extension of logistic regression with more than 2 levels for the outcome but no ordering of the outcome is required
- Independent observations (no repeated measures or clustering)
- Objective: to evaluate the effect of continuous and/or categorical explanatory variables on the probability of level 2 outcome vs. level 1 outcome and level 3 outcome vs. level 1 outcome (this is different from the cumulative model because you estimate a different effect for each level separately rather than a cumulative effect)
- Interpretation: Odds ratios again – similar to logistic regression but now must specify which two levels of the outcome you are comparing.

Poisson Regression
- Count or rate outcome (0, 1, 2, . . . )
- You can have varying follow-up time on subjects and model the outcome as a rate over time
- Independent observations (no repeated measures or clustering)
- Objective: to evaluate the effect of continuous and/or categorical explanatory variables on the outcome
- Interpretation: Effect estimates are used to obtain incidence rate ratios. Fabricated example: older drivers have a two times increased rate of motor vehicle crashes/year than middle-aged drivers.
- Related models include the negative binomial and zero-inflated Poisson

Survival Model (Cox Proportional Hazards)
- Outcome is time to a certain event (e.g., death, disease development or recovery, stabilization of patient)
- Independent observations (no repeated measures or clustering)
- Special methods are necessary because not all patients/units will have an observed outcome (called censoring).
- Objective: to evaluate the effect of continuous and/or categorical explanatory variables on the time to event outcome
- Interpretation: Effect estimates are used to obtain relative risk of the outcome at a given time. Fabricated example: the relative risk of death using radiation therapy only vs. radiation + additional treatment is 6.2. Risk of death is six times higher for radiation only than radiation + new treatment combined.
- You also can estimate median survival time and mortality rates overall and within groups. Graphing the survival function illustrates changes in survival over time.

Mixed Models (Hierarchical or Multilevel Modeling)
- Continuous outcome (similar framework can extend to categorical outcomes)
- Mixed refers to fixed and random effects. Fixed effects are those for which any levels you wish you make conclusions about are included in your study (e.g., sex). Random effects are those for which you wish to take the conclusions from your study and apply to a wider range of factors (e.g., hospitals).
- Often used for dependent observations where data is clustered by some factor (e.g., treating physicians, hospital sites, individual w/repeated measures). It is not appropriate to analyze clustered data by traditional methods that do not account for the correlation between observations.
- This is a very powerful and flexible framework for correctly evaluating not only data means but also the variance and covariance structure of the data.
- Repeated Measures ANOVA can also be used to look at measurements taken over time but is a less flexible framework, especially in the case of missing data.

Generalized Estimating Equations (GEE)
If observations are not independent because of clustering (e.g., treating physicians, hospital sites, repeated measures) then another option other than the mixed model is adjusting for the correlation between observations by using generalized estimating equations. This is possible for most of the models discussed above. It allows accurate estimation and significance testing of both individual level and cluster
level variables.
Caution: At least 40 clusters are needed for GEE to yield reliable estimates.

rev. 04-Aug-2022