10.5 - Using Group Variables

Thus far, all of the reports that we have created have been what are called list reports, in which detail rows are displayed for each observation appearing in the data set. Well, that's not quite true ... we did create one report in which a detail row was displayed only for those observations in which Type equaled Private or Resort. Still, though, no summation of observations took place when creating the reports that we have created thus far. Now, suppose we are interested in creating a summary report, in which the rows that appear in the report are created by summarizing the observations in the data set. For example, we might be interested in creating the following report:

Some Pennsylvania Golf Courses

Type of Course

Total Par

Total Yardage

Private

214

19,660

Public

72

6525

Resort

144

14,141

SemiPri

353

31,974

in which the total par and total yardage are displayed for each of the four types of golf courses.

To create such a summary report using the REPORT procedure, we need to define one or more group variables. In general, a group variable groups the detail rows in a report according to their formatted values. If a report contains one or more group variables, then SAS consolidates into one row all observations from the data set that have a unique combination of values for all of the defined group variables.

Example 10.14 Section

Among other things, the DEFINE statements in the following REPORT procedure define the Type variable as a group variable in order to create the summary report as illustrated above:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Type Par Yards;
	 define Type / group 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / analysis 'Total/Par';
	 define Yards / analysis format = comma6.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
RUN;

Some Pennsylvania Golf Courses

Type of Course

Total Par

Total Yardage

Private

214

19,660

Public

72

6525

Resort

144

14,141

SemiPri

353

31,974

Let's dissect this procedure. The COLUMN statement tells SAS that we only want to display three columns, namely Type, Par, and Yards, in that order. The first DEFINE statement tells SAS to use Type as a group variable, as well as specifies the column heading, width, spacing, and justification. The second DEFINE statement tells SAS to use Par as an analysis variable, as well as specifies the column heading. And, the third DEFINE statement tells SAS to use Yards as an analysis variable, as well as specifies the column format, heading, width, spacing, and justification.

Now, launch and run  the SAS program, and review the output to convince yourself that SAS collapses the observations, and sums the Par and Yards variables as depicted in the summary report at the beginning of this section. You might want to recall that SAS sums the Par and Yards variables, rather than calculates their average, say, because that's how SAS handles analysis variables by default. In the next section, we'll learn how to change the default so that SAS calculates an average instead, say.

One more thing about this example. It is not necessary, of course, to define Par and Yards as analysis variables here, as SAS would use them as analysis variables by default anyway since they are both numeric variables. If you're not convinced, delete the analysis word in the DEFINE statements for the Par and Yards variables, and then re-run  the SAS program to verify that you still get the same report.

And one final comment about summary reports in general. All of the variables in a summary report must be defined as group, analysis, across, or computed variables. This is because the REPORT procedure must be able to summarize all variables across an observation in order to collapse observations. If the REPORT procedure can't create groups, it displays group variables as order variables. We'll make sure that the homework for this lesson makes this closing comment make sense.