# Lesson 29: Analysis of Variance

Lesson 29: Analysis of Variance## Overview

In this lesson, we investigate two of the more common statistical analysis procedures. Specifically, we investigate:

- the
**ANOVA**procedure, to conduct analysis of variance tests when you have balanced data,*i.e.*, when each group has the same number of observations - the
**GLM**procedure, to conduct analysis of variance tests when you have continuous effects or unbalanced data,*i.e.*, when each group does not have the same number of observations

## Objectives

Upon completing this lesson, you should be able to do the following:

- use the ANOVA procedure to conduct a one-way analysis of variance
- read the basic output that arises from invoking the ANOVA procedure
- use the GLM procedure to conduct an analysis of variance on unbalanced data
- use the GLM procedure to conduct an analysis of covariance
- read the basic output that arises from invoking the GLM procedure
- use the MEANS procedure and the GPLOT procedure to create an interaction plot

## Textbook Reference

* * Chapter 7 of the textbook.

# 29.1 - Lesson Notes

29.1 - Lesson Notes### B. One-Way Analysis of Variance

**Page 200.** In the last line, I wish the authors had said "the *sample* means of groups X, Y, and Z are 786m 518, and 548, respectively." Keep in mind that the field of statistics pretty much exists because we typically can't measure every person in the population. Instead, we take samples from the population, and try to use the sample data to draw conclusions about the larger population.

**Page 202. **There is a typo in the calculation of the F statistic in the last line. It should read F = 2700/75 = 36.0.

### C. Computing Contrasts

**Page 208.** In practice, I rarely use the ANOVA procedure. Since the GLM procedure is much more powerful and flexible than the ANOVA procedure, I think it just makes more sense to use GLM in all situations instead of hopping back and forth between the two procedures.

**Page 209.** In case it's not obvious that the first CONTRAST statement gives a comparison of method X against the mean of methods Y and Z, let's (try to) make it clearer. The difference between X and the average of Y and Z can be written as:

(Y + Z)/2 - X

Then, if you multiply by 2, you get:

(Y + Z) - 2X

Rearranging the contrast so that the variables appear in alphanumeric order:

-2X + Y + Z

we get the coefficients -2, 1, and 1, as they appear in the CONTRAST statement on page 208

### E. Interpreting Significant Interactions

**Page 215.** I *really* like this application of a nested DO loop... that is, using a nested DO loop as a way of creating an experimental design in your data set. If it's not obvious to you what the *ritalin* data set looks like, you might want to take a peak at the output from the program:

```
OPTIONS PS = 58 LS = 72 NODATE NONUMBER;
DATA ritalin;
do group = 'NORMAL', 'HYPER';
do drug = 'PLACEBO', 'RITALIN';
do subj = 1 to 4;
input activity @;
output;
end;
end;
end;
DATALINES;
50 45 55 52 67 60 58 65 70 72 68 75 51 57 48 55
;
RUN;
PROC PRINT data = ritalin NOOBS;
title 'The ritalin data set';
RUN
```

**Page 216.** Ahhh, at the bottom of this page, we now have a beautiful application of creating a data set from the output of the MEANS procedure. Again, it might be helpful to you if you take a peak at the contents of the *means* data set:

```
PROC MEANS data = ritalin nway noprint;
class group drug;
var activity;
output out = means mean = m_hr;
RUN;
PROC PRINT data = means NOOBS;
title 'Cell means from ritalin experiment';
RUN;
```

As I said before when we first learned about the MEANS procedure, creating a data set, rather than printed output, from the MEANS procedure is a common thing to do.

# 29.2 - Summary

29.2 - SummaryIn this lesson, we learned how to use the ANOVA and GLM procedures for comparing two or more groups.

The homework for this lesson will give you more practice with these methods.