Lesson 29: Analysis of VarianceLesson 29: Analysis of Variance
In this lesson, we investigate two of the more common statistical analysis procedures. Specifically, we investigate:
- the ANOVA procedure, to conduct analysis of variance tests when you have balanced data, i.e., when each group has the same number of observations
- the GLM procedure, to conduct analysis of variance tests when you have continuous effects or unbalanced data, i.e., when each group does not have the same number of observations
Upon completing this lesson, you should be able to do the following:
- use the ANOVA procedure to conduct a one-way analysis of variance
- read the basic output that arises from invoking the ANOVA procedure
- use the GLM procedure to conduct an analysis of variance on unbalanced data
- use the GLM procedure to conduct an analysis of covariance
- read the basic output that arises from invoking the GLM procedure
- use the MEANS procedure and the GPLOT procedure to create an interaction plot
Chapter 7 of the textbook.
29.1 - Lesson Notes29.1 - Lesson Notes
B. One-Way Analysis of Variance
Page 200. In the last line, I wish the authors had said "the sample means of groups X, Y, and Z are 786m 518, and 548, respectively." Keep in mind that the field of statistics pretty much exists because we typically can't measure every person in the population. Instead, we take samples from the population, and try to use the sample data to draw conclusions about the larger population.
Page 202. There is a typo in the calculation of the F statistic in the last line. It should read F = 2700/75 = 36.0.
C. Computing Contrasts
Page 208. In practice, I rarely use the ANOVA procedure. Since the GLM procedure is much more powerful and flexible than the ANOVA procedure, I think it just makes more sense to use GLM in all situations instead of hopping back and forth between the two procedures.
Page 209. In case it's not obvious that the first CONTRAST statement gives a comparison of method X against the mean of methods Y and Z, let's (try to) make it clearer. The difference between X and the average of Y and Z can be written as:
(Y + Z)/2 - X
Then, if you multiply by 2, you get:
(Y + Z) - 2X
Rearranging the contrast so that the variables appear in alphanumeric order:
-2X + Y + Z
we get the coefficients -2, 1, and 1, as they appear in the CONTRAST statement on page 208
E. Interpreting Significant Interactions
Page 215. I really like this application of a nested DO loop... that is, using a nested DO loop as a way of creating an experimental design in your data set. If it's not obvious to you what the ritalin data set looks like, you might want to take a peak at the output from the program:
OPTIONS PS = 58 LS = 72 NODATE NONUMBER; DATA ritalin; do group = 'NORMAL', 'HYPER'; do drug = 'PLACEBO', 'RITALIN'; do subj = 1 to 4; input activity @; output; end; end; end; DATALINES; 50 45 55 52 67 60 58 65 70 72 68 75 51 57 48 55 ; RUN; PROC PRINT data = ritalin NOOBS; title 'The ritalin data set'; RUN
Page 216. Ahhh, at the bottom of this page, we now have a beautiful application of creating a data set from the output of the MEANS procedure. Again, it might be helpful to you if you take a peak at the contents of the means data set:
PROC MEANS data = ritalin nway noprint; class group drug; var activity; output out = means mean = m_hr; RUN; PROC PRINT data = means NOOBS; title 'Cell means from ritalin experiment'; RUN;
As I said before when we first learned about the MEANS procedure, creating a data set, rather than printed output, from the MEANS procedure is a common thing to do.
29.2 - Summary29.2 - Summary
In this lesson, we learned how to use the ANOVA and GLM procedures for comparing two or more groups.
The homework for this lesson will give you more practice with these methods.