Here is another strategy that outlines some basic steps for building a regression model.
After establishing a research hypothesis, proceed to design an appropriate experiment or experiments. Identify variables of interest, what variable(s) will be the response, and what levels of the predictor variables you wish to cover in the study. If costs allow for it, then a pilot study may be helpful (or necessary).
Collect the data and make sure to "clean" it for any bugs (e.g., entry errors). If data from many variables are recorded, then variable selection and screening should be performed.
Consider the regression model to be used for studying the relationship and assess the adequacy of such a model. Oftentimes, a linear regression model will be implemented. But as these notes show, there are numerous regression models and regression strategies for dealing with different data structures. How you assess the adequacy of the fitted model will be dependent on the type of regression model that is being used as well as the corresponding assumptions. For linear regression, the following need to be checked:
- Check for normality of the residuals. This is often done through a variety of visual displays, but formal statistical testing can also be performed.
- Check for the constant variance of the residuals. Again, visual displays and formal testing can both be performed.
- Check the linearity condition using residual plots.
- After time-ordering your data (if appropriate), assess the independence of the observations. Independence is best assessed by looking at a time-ordered plot of the residuals, but other time series techniques exist for assessing the assumption of independence (this is discussed further in the optional content). Regardless, checking the assumptions of your model as well as the model’s overall adequacy is usually accomplished through residual diagnostic procedures.
Look for any outlying or influential data points that may be affecting the overall fit of your current model (we'll discuss this further in Lesson 11). Care should be taken with how you handle these points as they could be legitimate in how they were measured. While the option does exist for excluding such problematic points, this should only be done after careful consideration about if such points are recorded in error or are truly not representative of the data you collected. If any corrective actions are taken in this step, then return to Step 3.
Assess multicollinearity, i.e., linear relationships amongst your predictor variables (we'll discuss this further in Lesson 12). Multicollinearity issues can provide incorrect estimates as well as other issues. If you proceed to omit variables or observations which may be causing multicollinearity, then return to Step 3.
Use the measures discussed in this Lesson to assess the predictability and overall goodness-of-fit of your model. If these measures turn out to be unsatisfactory, then modifications to the model may be in order (e.g., a different functional form or down-weighting certain observations). If you must take such actions, then return to Step 3 afterward.