A statistical model is a representation of a complex phenomena that generated the data.

  • It has mathematical formulations that describe relationships between random variables and parameters.
  • It makes assumptions about the random variables, and sometimes parameters.
  • A general form: data = model + residuals
  • Model should explain most of the variation in the data
  • Residuals are a representation of a lack-of-fit, that is of the portion of the data unexplained by the model.

In models, the focus is on estimating the model parameters. The basic inference tools (e.g., point estimation, hypothesis testing, and confidence intervals) will be applied to the these parameters. When discussing models, we will keep in mind the following parts:


State what the objective is for this model. For instance, "Estimate the probability that a characteristic is present given the value of the explanatory values are ... "

Fitting the ModelModel Structure

State the important variables in the model. What is the response variable Y? What is included in the set of explanatory variables?

State the Equation for the Model.

Model Assumptions

State the assumptions for the model that you are using. Are the data independently distributed? Do linear relationships exist between the dependent and independent variables? Is the variance homogeneous? Are errors independent and normally distributed?

Parameter Estimation and Interpretation

What are the odds that the objective characteristic is present? What evidence will you use to establish the estimate?

Model Fit

How is goodness of fit determined? Pearson chi-square statistic? Deviance? Likelihood ratio test? What does the analysis of the residuals show?

Model Selection

Choosing a single ”best” model among the presence of more than one reasonable model involves some subjective judgment. We seek a parsimonious model that is as simple as possible and adequately explains the phenomena of interest.

Fitting the Model in SAS

How can we fit a particular family of models in SAS and evaluate different parts of the output?

Fitting the Model in R

How can we fit a particular family of models in SAS and evaluate different parts of the output?