12.1 - A Basic One-Way Table
12.1 - A Basic One-Way TableBy default, the FREQ procedure creates a one-way table that contains the frequency, percent, cumulative frequency, and cumulative percent of every value of every variable in the input data set. Every is italicized with good reason ... the FREQ procedure doesn't care whether the variable is a character variable or a numeric variable. And, if a variable is numeric, the FREQ procedure doesn't care if it is a discrete numeric variable with just a few possible outcomes (number of siblings, say) or a continuous numeric variable with an infinite number of possible outcomes (weight, say). That means then if you rely on the default version of the FREQ procedure, it is possible to create lots and lots and lots of output. That's why we'll skip the default version and will jump right to the more practical version in which you restrict the number of tables SAS creates by using a TABLES statement.
The FREQ procedure takes the following generic form:
PROC FREQ options;
tables ... /options;
RUN;
The TABLES statement tells SAS the specific frequency table(s) that you want to create. If you don't include a TABLES statement, then SAS creates a one-way frequency table for every variable in your input data set.
As you can see, there are two types of options, namely procedure options and table options. Procedure options, such as the typical "DATA=" option, must follow the PROC FREQ statement. Table options must be specified after a forward slash (/) in the TABLES statement. In either case, you can specify as many options as you would like.
Throughout this lesson, we'll use the ICDB background data set to illustrate the FREQ procedure. Right-click the link to save the data set to a convenient location on your computer.
Example 12.1
The following FREQ procedure illustrates the simplest practical example, namely a one-way frequency table of the variable sex, with no bells or whistles added:
OPTIONS PS = 58 LS = 72 NODATE NONUMBER;
LIBNAME icdb 'C:\simon\icdb\data';
PROC FREQ data=icdb.back;
title 'Frequency Count of SEX';
tables sex;
RUN;
Launch the SAS program and edit the LIBNAME statement so that it reflects the location in which you saved the background data set. Then, run the program and review the output. You should see something along the lines of this basic one-way frequency table, in which, as promised, SAS reports the frequency, percent, cumulative frequency, and cumulative percent of each value of the sex variable:
sex | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
1 | 56 | 8.78 | 56 | 8.78 |
2 | 582 | 91.22 | 638 | 100.00 |
This output tells us, for example, that 56 or 8.78% of the subjects in the ICDB Study are male (coded as sex = 1).
Example 12.2
Again, by default, SAS outputs frequency counts, percents, cumulative frequencies, and cumulative percents. The NOCUM table option suppresses the printing of the cumulative frequencies and cumulative percentages for one-way frequency tables. The following SAS code illustrates the NOCUM table option:
PROC FREQ data=icdb.back;
title 'Frequency Count of SEX: No Cumulative Stats';
tables sex/nocum;
RUN;
sex | Frequency | Percent |
---|---|---|
1 | 56 | 8.78 |
2 | 528 | 91.22 |
Launch and run the SAS program. Review the output to convince yourself that indeed the cumulative frequencies and cumulative percentages are not printed in the table. The table contains only the number and percentage of each sex.
In any FREQ procedure, you can specify many variables in a TABLES statement. If the list is long, you may be able to use a shortcut to specify the list of variables. If you specify a TABLES statement using a numbered range of variables, such as:
tables var1-var4;
then SAS will create a one-way frequency table for the four variables named var1, var2, var3, and var4. If instead in your TABLES statement, you specify a range of variables by their position in the data set, such as:
tables sex--race;
then SAS will create a one-way frequency table for every variable that appears between the sex and race variables in the data set, namely in the case of the background data set, sex, state, country, and race. Recall that if you're not sure of the position of the variables in your data set, you can use the VARNUM option of the CONTENTS procedure to determine the position of the variables in a data set. (Incidentally, that is not a typo in the second TABLES statement ... it takes two dashes to specify a range of variables by their position in the data set.)
Rather than specifying many variables in a TABLES statement, you can specify many TABLES statements in a FREQ procedure. However you tell SAS to make multiple tables, you can use the PAGE option to tell SAS to print only one table per page. Otherwise, the FREQ procedure prints multiple tables per page as space permits.
Example 12.3
The following SAS program illustrates the creation of two one-way frequency tables in conjunction with the PAGE option:
PROC FREQ data=icdb.back page;
title 'Frequency Count of SEX and RACE';
tables sex race;
RUN;
Launch and run the SAS program. Review the output to convince yourself that indeed SAS creates two one-way frequency tables — one for the categorical variable sex and the other for the categorical variable race. Because the PAGE option was invoked, each table should be printed on a separate page. The first page should contain the frequency table for the sex variable:
sex | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
1 | 56 | 8.78 | 56 | 8.78 |
2 | 528 | 91.22 | 638 | 100.00 |
and the second page should contain the frequency table for the race variable:
race | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
1 | 2 | 0.31 | 2 | 0.31 |
2 | 7 | 1.10 | 9 | 1.41 |
3 | 29 | 4.55 | 38 | 5.96 |
4 | 593 | 92.95 | 631 | 98.90 |
5 | 3 | 0.47 | 634 | 99.37 |
6 | 2 | 0.31 | 636 | 99.69 |
7 | 1 | 0.16 | 637 | 99.87 |
8 | 1 | 0.16 | 638 | 100.00 |
Incidentally, you might also want to notice that, not surprisingly, the order in which the variables appear in the TABLES statement determines the order in which they appear in the output.
Example 12.4
As is the case for many SAS procedures, you can use a BY statement to tell SAS to perform an operation for each level of a BY group. The following program tells SAS to create a one-way frequency table for the variable ed_level for each level of the variable sex:
PROC SORT data=icbd.back out=s_back;
by sex;
RUN;
PROC FREQ data=s_back;
title 'Frequency Count of Education Level within Each Level of Sex';
tables ed_level;
by sex;
RUN;
As is always the case, the SORT procedure merely prepares the background data set for BY-group processing. The SORT procedure tells SAS to sort the icdb.back data set by sex, and to store the results in a new data set called s_back. Then, as you can see, the FREQ procedure is invoked with a BY statement ("by sex") in addition to the TABLES statement ("tables ed_level"). Launch and run the SAS program. Review the output to convince yourself that SAS creates two one-way frequency tables of education level (ed_level) — one for males (sex= 1):
ed_level | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
1 | 4 | 7.14 | 4 | 7.14 |
2 | 7 | 12.50 | 11 | 19.64 |
3 | 12 | 21.43 | 23 | 41.07 |
4 | 20 | 35.71 | 43 | 76.79 |
5 | 13 | 23.21 | 56 | 100.00 |
and one for females (sex = 2):
ed_level | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
1 | 7 | 1.20 | 7 | 1.20 |
2 | 22 | 3.78 | 29 | 4.98 |
3 | 220 | 37.80 | 249 | 42.78 |
4 | 229 | 39.35 | 478 | 82.13 |
5 | 104 | 17.87 | 582 | 100.00 |
So far, in each of the examples we have looked at, no missing values exist. When they do exist, SAS by default excludes them from your requested frequency tables. Instead, SAS prints the "Frequency Missing" below each table. You can instead opt to use the MISSING tables option, in which you tell SAS to treat missing values as non-missing values and to therefore include them in the calculation of percentages and other statistics. Or you can opt to use the MISSPRINT option to tell SAS to treat missing values as non-missing values when printing the frequencies but do not include them in the calculation of the statistics.
Example 12.5
The following SAS program illustrates the MISSING and MISSPRINT options on the variable state in the background data set:
PROC FREQ data=icdb.back;
title 'One-way Table of State: with MISSING Option';
tables state/missing;
RUN;
PROC FREQ data=icdb.back;
title 'One-way Table of State: with MISSPRINT Option';
tables state/missprint;
RUN;
Launch and run the SAS program, and review the resulting output. The first few rows of output from the FREQ procedure with the MISSING tables option should look something like this:
sex | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
. | 42 | 6.58 | 42 | 6.58 |
1 | 4 | 0.63 | 46 | 7.21 |
3 | 1 | 0.16 | 47 | 7.37 |
4 | 5 | 0.78 | 52 | 8.15 |
As you can see, the first row tells us that 42 subjects did not report the state in which they live. Because the MISSING option was used, SAS also tells us that 42 subjects comprise 6.58% of the subjects in the data set. SAS also includes the 42 subjects in the calculation of the cumulative percentage.
On the other hand, the first few rows of output from the FREQ procedure with the MISSPRINT tables option should look something like this:
sex | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
. | 42 | . | . | . |
1 | 4 | 0.67 | 4 | 0.67 |
3 | 1 | 0.17 | 5 | 0.84 |
4 | 5 | 0.84 | 10 | 1.68 |
As you can see, the first row again tells us that 42 subjects did not report the state in which they live. In this case, however, because the MISSPRINT option was specified, SAS stops there. That is, SAS does not include the subjects in any of its calculations of the percent, cumulative frequency, or cumulative percent.