Lesson 29: Analysis of Variance

Overview

In this lesson, we investigate two of the more common statistical analysis procedures. Specifically, we investigate:

the ANOVA procedure, to conduct analysis of variance tests when you have balanced data, i.e., when each group has the same number of observations
the GLM procedure, to conduct analysis of variance tests when you have continuous effects or unbalanced data, i.e., when each group does not have the same number of observations

Objectives

Upon completion of this lesson, you should be able to:

use the ANOVA procedure to conduct a one-way analysis of variance
read the basic output that arises from invoking the ANOVA procedure
use the GLM procedure to conduct an analysis of variance on unbalanced data
use the GLM procedure to conduct an analysis of covariance
read the basic output that arises from invoking the GLM procedure
use the MEANS procedure and the GPLOT procedure to create an interaction plot

Textbook Reference

Chapter 7 of the textbook.

29.1 - Lesson Notes

B. One-Way Analysis of Variance

Page 200. In the last line, I wish the authors had said "The sample means of groups X, Y, and Z are 786m 518, and 548, respectively." Keep in mind that the field of statistics pretty much exists because we typically can't measure every person in the population. Instead, we take samples from the population and try to use the sample data to draw conclusions about the larger population.

Page 202. There is a typo in the calculation of the F statistic in the last line. It should read F = 2700/75 = 36.0.

C. Computing Contrasts

Page 208. In practice, I rarely use the ANOVA procedure. Since the GLM procedure is much more powerful and flexible than the ANOVA procedure, I think it just makes more sense to use GLM in all situations instead of hopping back and forth between the two procedures.

Page 209. In case it's not obvious that the first CONTRAST statement gives a comparison of method X against the mean of methods Y and Z, let's (try to) make it clearer. The difference between X and the average of Y and Z can be written as:

(Y + Z)/2 - X

Then, if you multiply by 2, you get:

(Y + Z) - 2X

Rearranging the contrast so that the variables appear in alphanumeric order:

-2X + Y + Z

we get the coefficients -2, 1, and 1, as they appear in the CONTRAST statement on page 208

E. Interpreting Significant Interactions

Page 215. I really like this application of a nested DO loop... that is, using a nested DO loop as a way of creating an experimental design in your data set. If it's not obvious to you what the Ritalin data set looks like, you might want to take a peak at the output from the program:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;
DATA ritalin;
	do group = 'NORMAL', 'HYPER';
		do drug = 'PLACEBO', 'RITALIN';
			do subj = 1 to 4;
				input activity @;
				output;
			end;
		end;
	end;
DATALINES;
50 45 55 52 67 60 58 65 70 72 68 75 51 57 48 55
;
RUN;

PROC PRINT data = ritalin NOOBS;
	title 'The ritalin data set';
RUN

The ritalin data set
group	drug	subj	activity
NORMAL	PLACEBO	1	50
NORMAL	PLACEBO	2	45
NORMAL	PLACEBO	3	55
NORMAL	PLACEBO	4	52
NORMAL	RITALIN	1	67
NORMAL	RITALIN	2	60
NORMAL	RITALIN	3	58
NORMAL	RITALIN	4	65
HYPER	PLACEBO	1	70
HYPER	PLACEBO	2	72
HYPER	PLACEBO	3	68
HYPER	PLACEBO	4	75
HYPER	RITALIN	1	51
HYPER	RITALIN	2	57
HYPER	RITALIN	3	48
HYPER	RITALIN	4	55

Page 216. Ahhh, at the bottom of this page, we now have a beautiful application for creating a data set from the output of the MEANS procedure. Again, it might be helpful to you if you take a peak at the contents of the means data set:

PROC MEANS data = ritalin nway noprint;
	class group drug;
	var activity;
	output out = means mean = m_hr;
RUN;

PROC PRINT data = means NOOBS;
	title 'Cell means from ritalin experiment';
RUN;

Cell means from ritalin experiment
group	drug	_TYPE_	_FREQ_	m_hr
HYPER	PLACEBO	3	4	71.25
HYPER	RITALIN	3	4	52.75
NORMAL	PLACEBO	3	4	50.50
NORMAL	RITALIN	3	4	62.50

As I said before when we first learned about the MEANS procedure, creating a data set, rather than printed output, from the MEANS procedure is a common thing to do.

29.2 - Summary

In this lesson, we learned how to use the ANOVA and GLM procedures for comparing two or more groups.

The homework for this lesson will give you more practice with these methods.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility