Just as the UNIVARIATE, MEANS and SUMMARY procedures can be used to create new SAS data sets containing summary statistics of numeric variables, the FREQ procedure can be used to create new SAS data sets containing summary statistics of categorical variables. This can be a very useful tool when manipulating and subsequently analyzing data.
Example 12.12 Section
The following FREQ procedure tells SAS to create an output data set that contains the counts and percentages for each combination of the variables sex and race:
PROC FREQ data=icdb.back;
tables sex*race/out=sexfreq noprint;
RUN;
PROC PRINT;
title 'Output Dataset: SEXFREQ';
RUN;
Obs | sex | race | COUNT | PERCENT |
---|---|---|---|---|
1 | 1 | 2 | 3 | 0.4702 |
2 | 1 | 3 | 1 | 0.1567 |
3 | 1 | 4 | 51 | 7.9937 |
4 | 1 | 7 | 1 | 0.1567 |
5 | 2 | 1 | 2 | 0.3135 |
6 | 2 | 2 | 4 | 0.6270 |
7 | 2 | 3 | 28 | 4.3887 |
8 | 2 | 4 | 542 | 84.9530 |
9 | 2 | 5 | 3 | 0.4702 |
10 | 2 | 6 | 2 | 0.3135 |
11 | 2 | 8 | 1 | 0.1567 |
The NOPRINT tables option tells SAS to suppress printing the two-way crosstabulation of sex*race. Instead, the results are dumped to the temporary data set sexfreq that is specified in the OUT= tables option. The data set in the OUT= option becomes the current data set. Therefore, since the PRINT procedure does not have a DATA= option to tell SAS the name of the data set to print, SAS prints the current data set, namely sexfreq.
Launch and run the SAS program. Review the output to convince yourself that the sexfreq data set contains one record for each combination of sex and race that occurs in the data set. Note that the data set contains four variables, sex, race, COUNT (whose value is the number of subjects in the data set of that sex and race), and PERCENT (whose value is the percentage of subjects in the data set of that sex and race).
Example 12.13 Section
The SPARSE tables option tells SAS to print information about all possible combinations of levels of the variables in the table request, even when some combinations of levels do not occur in the data. This option affects printouts under the LIST option and output data sets only. The following SAS code illustrates the use of the SPARSE option in the creation of an output data set called sexfreq:
PROC FREQ data=icdb.back;
tables sex*race/out=sexfreq noprint sparse;
RUN;
PROC PRINT;
title 'Output Dataset: SEXFREQ with SPARSE option';
RUN;
Obs | sex | race | COUNT | PERCENT |
---|---|---|---|---|
1 | 1 | 1 | 0 | 0.0000 |
2 | 1 | 2 | 3 | 0.4702 |
3 | 1 | 3 | 1 | 0.1567 |
4 | 1 | 4 | 51 | 7.9937 |
5 | 1 | 5 | 0 | 0.0000 |
6 | 1 | 6 | 0 | 0.0000 |
7 | 1 | 7 | 1 | 0.1567 |
8 | 1 | 8 | 0 | 0.0000 |
9 | 2 | 1 | 2 | 0.3135 |
10 | 2 | 2 | 4 | 0.6270 |
11 | 2 | 3 | 28 | 4.3887 |
12 | 2 | 4 | 542 | 84.9530 |
13 | 2 | 5 | 3 | 0.4702 |
14 | 2 | 6 | 2 | 0.3135 |
15 | 2 | 7 | 0 | 0.0000 |
16 | 2 | 8 | 1 | 0.1567 |
The only difference between this code and the previous code is the inclusion of the SPARSE option. Launch and run the SAS program. Review the output to convince yourself that the sexfreq data set now contains one record for each possible combination of sex and race regardless of whether or not the combination occurs in the data set. That is, since there are 2 sexes and 8 races, the sexfreq data set contains 16 records — one for each of the combinations. For example, there is now a record in the sexfreq data set corresponding to sex = 1 and race = 1 even though there are no subjects in the data set of that sex and race combination (COUNT = 0).