So far, we have only used the FREQ procedure to create one-way frequency tables. However, it is often helpful to create crosstabulation tables in which the frequencies are determined for more than one variable at a time. For example, if we were interested in knowing how the percentage of voters favoring Barack Obama differs for various age groups, we'd need to create a two-way crosstabulation table between the two categorical variables *candidate* (Obama, McCain) and *agegroup* (18-29, 30-44, 45-54, 55-70, more than 70). We'll investigate such tables in this section.

##
Example 12.6
Section* *

The following FREQ procedure illustrates the simplest example of telling SAS to create a two-way table, for the variables `sex` and `ed_level`, with no bells and whistles added:

```
PROC FREQ data=icdb.back;
title 'Crosstabulation of Education Level and Sex';
tables ed_level*sex;
RUN;
```

##### Crosstabulation of Education Level and RACE

##### The FREQ Procedure

##### table of ed_level by sex

Frequency | |||
---|---|---|---|

Percent | |||

Row Pct | |||

Col Pct | 1 | 2 | Total |

1 | 4 | 7 | 11 |

0.63 | 1.10 | 1.72 | |

36.36 | 63.64 | ||

7.14 | 1.20 | ||

2 | 7 | 22 | 29 |

1.10 | 3.45 | 4.55 | |

24.14 | 75.86 | ||

12.50 | 3.78 | ||

3 | 12 | 220 | 232 |

1.88 | 34.48 | 36.36 | |

5.17 | 94.83 | ||

21.43 | 37.80 | ||

4 | 20 | 229 | 249 |

3.13 | 35.89 | 39.03 | |

8.03 | 91.97 | ||

35.71 | 39.35 | ||

5 | 13 | 104 | 117 |

2.04 | 16.30 | 18.34 | |

11.11 | 88.89 | ||

23.21 | 17.87 | ||

Total | 56 | 582 | 638 |

8.78 | 91.22 | 100.00 |

As you can see, to tell SAS to create a two-way table of `ed_level` and `sex`, we merely use an asterisk (*) to join the two variables in the TABLES statement.

Launch and run * * the SAS program. Review the output to convince yourself that SAS created the requested two-way table. In general, the values of the variable appearing before the asterisk form the rows of the table, and the values of the variable appearing after the asterisk form the columns of the table. In this case, since `ed_level` appears before the asterisk in the TABLES statement, its values form the rows of the table. And, since `sex` appears after the asterisk, its values form the columns of the table.

When SAS creates two-way tables, each cell of the table contains, by default, the cell frequency, the cell percentage of the total frequency, the cell percentage of the row frequency, and the cell percentage of the column frequency. You might want to review the numbers in each of the cells of the table to make sure you understand what is what. In the upper left-hand corner of the table, SAS always prints a guide to the numbers appearing in each of the cells in the table. Here, SAS tells us that the first number in cell (`i`,`j`) is the number of subjects of `ed_level` `i` and `sex` `j`; the second number in cell (`i`,`j`) is the percentage of subjects of `ed_level` `i` and `sex` `j`; the third number in cell (`i`,`j`) is the percentage of subjects who are `sex` `j` given that they are `ed_level` `i`; and the fourth number in cell (`i`,`j`) is the percentage of subjects who are `ed_level` `i` given that they are `sex` `j`. For example, for the cell in which `ed_level ` = 4 and `sex` = 2, SAS tells us that: 229 of the subjects in the data set are `ed_level` 4 and `sex` 2; 35.89% of the subjects in the data set are `ed_level` 4 and `sex` 2; 91.97% of the subjects in the data set who are `ed_level` 4 are `sex` 2; and 39.35% of the subjects in the data set who are `sex` 2 are `ed_level` 4.

**A little note about shortcuts.** If you have many different two-way tables to create, you can use a variety of shortcuts. For example, the TABLES statement:

`tables a*(b c);`

tells SAS to create a two-way table between variables `a` and `b` (a*b) and a two-way table between variables `a `and `c` (a*c). The TABLES statement:

`tables (a b)*(c d);`

tells SAS to create four two-way tables, namely: a*c, b*c, a*d, and b*d. The TABLES statement:

`tables (a b c)*d;`

tells SAS to create three two-way tables, namely: a*d, b*d, and c*d.

##
Example 12.7
Section* *

For a frequency analysis of more than two variables, we can use the FREQ procedure to create `n`-way crosstabulation tables. In that case, a series of two-way tables is created, with a table for each level of the other variable(s). The following program creates a three-way table of `sex`, `job_chng`, and `ed_level`:

```
PROC FREQ data=icdb.back;
title '3-way Table of Sex, Job Change, and Ed. Level';
tables sex*job_chng*ed_level;
RUN;
```

As you can see, to tell SAS to create a three-way table of `sex`,` job_chng, and ed_level, we use an asterisk (*) to join the three variables in the TABLES statement. The order of the variables is important. In n-way tables, the last two variables of the TABLES statement become the rows and columns of the two-way tables. Variables that precede the last two variables in the TABLES statement stratify the crosstabulation tables. So, in this case, we should expect SAS to create two two-way tables of job_chng and ed_level, one for when sex = 1 and one for when sex = 2. `

Launch and run * * the SAS program, and review the output to convince yourself that SAS created the three-way table as described. You should see that, indeed, SAS created one two-way table of `job_chng` and `ed_level` for when `sex` = 1:

##### Crosstabulation of Sex, Job Change, and Education Level

##### The FREQ Procedure

##### Table 1 of job_chng by ed_level

##### Controlling for sex=1

job_chng ed_level

Frequency | ||||||
---|---|---|---|---|---|---|

Percent | ||||||

Row Pct | ||||||

Col Pct | 1 | 2 | 3 | 4 | 5 | Total |

0 | 4 | 6 | 11 | 15 | 12 | 48 |

7.84 | 11.76 | 21.57 | 29.41 | 23.53 | 94.12 | |

8.33 | 12.50 | 22.92 | 31.25 | 25.00 | ||

100.00 | 100.00 | 100.00 | 88.24 | 92.31 | ||

1 | 0 | 0 | 0 | 2 | 1 | 3 |

0.00 | 0.00 | 0.00 | 3.92 | 1.96 | 5.88 | |

0.00 | 0.00 | 0.00 | 66.67 | 33.33 | ||

0.00 | 0.00 | 0.00 | 11.76 | 7.69 | ||

Total | 4 | 6 | 11 | 17 | 13 | 51 |

7.84 | 11.76 | 21.57 | 33.33 | 25.49 | 100.00 |

Frequency Missing = 107

and one two-way table of `job_chng` and `ed_level` for when `sex` = 2:

##### table 2 of job_chng by ed_level

##### Controlling for sex=2

job_chng ed_level

Frequency | ||||||
---|---|---|---|---|---|---|

Percent | ||||||

Row Pct | ||||||

Col Pct | 1 | 2 | 3 | 4 | 5 | Total |

0 | 4 | 14 | 148 | 163 | 74 | 403 |

0.84 | 2.95 | 31.16 | 34.32 | 15.58 | 84.84 | |

0.99 | 3.47 | 36.72 | 40.45 | 18.36 | ||

80.00 | 82.35 | 85.06 | 83.16 | 89.16 | ||

1 | 1 | 3 | 26 | 33 | 9 | 72 |

0.21 | 0.63 | 5.47 | 6.95 | 1.89 | 15.16 | |

1.39 | 4.17 | 36.11 | 45.83 | 12.50 | ||

20.00 | 17.65 | 14.94 | 16.84 | 10.84 | ||

Total | 5 | 17 | 174 | 196 | 83 | 475 |

1.05 | 3.58 | 36.63 | 41.26 | 17.47 | 100.00 |

Frequency Missing = 107

It probably goes without saying that, in general, `n`-way tables can generate lots of output.