Lesson 10: The REPORT Procedure

Lesson 10: The REPORT Procedure

Overview

A few lessons back, we learned how to use the PRINT procedure to create basic list reports of SAS data sets. In this lesson, we'll learn how to use the REPORT procedure to create not only basic list reports, but also customized list reports, as well as a broad variety of summary reports. The customization that we will investigate includes assigning formats to some of the columns in the report, defining column headings, specifying the order of the contents of our reports, as well as specifying the justification, width, and spacing of the columns. We'll also learn how to group the character variables in our reports, so that we can create summary reports that contain statistical summaries of the columns, such as subtotals, grand totals, and averages.

You should know that the REPORT procedure is capable of creating highly customized reports. We'll necessarily focus only on the most common statements and the most common enhancements. For more extensive information about the REPORT procedure features, you might want to try Carpenter's Complete Guide to the SAS REPORT Procedure.

Example 10.1

Throughout this lesson, we'll work with a permanent SAS data set called stat480.penngolf that contains information on some Pennsylvania golf courses. The following SAS program merely tells SAS to print the stat480.penngolf data set:

OPTIONS PS = 58 LS = 80 NODATE NONUMBER;

LIBNAME stat480 'C:\simon\stat480wc\fa08\10report\sasndata\';

PROC PRINT data = stat480.penngolf;
   title 'Some Pennsylvania Golf Courses';
RUN;
Some Pennsylvania Golf Courses
Obs ID Name Architect Year Type Par Yards Slope USGA
1 101 Toftrees Ed Ault 1968 Resort 72 7018 134 74.3
2 102 Penn State Blue William Park, Jr. 1921 Public 72 6525 128 72.0
3 103 Centre Hills Alex Findlay 1921 Private 71 6392 128 71.2
4 104 Lewistown CC   . Private 72 6779 125 72.3
5 105 State College Elks Lowell Erdman 1973 SemiPri 71 6369 123 70.9
6 106 Park Hills CC James Harrison 1966 SemiPri 70 6004 126 69.3
7 107 Sinking Valley CC Ad Ault 1967 SemiPri 72 6755 132 73.4
8 108 Williamsport CC A.W Tillinghast 1909 Private 71 6489 131 71.9
9 109 Standing Stone CC Geoffrey Cornish 1973 SemiPri 70 6593 120 71.4
10 110 Bucknell GC   1960 SemiPri 70 6253 132 70.0
11 111 Mount Airy Lodge Hal Purdy 1972 Resort 72 7123 140 74.3

First, download the stat480.penngolf data set by clicking on the link. After you've saved the data set in a convenient location on your computer, launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the stat480.penngolf data set. Finally, run the SAS program, so that you have access to the data set throughout the lesson. Then, review the output from the PRINT procedure so that you can familiarize yourself with the data set.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • produce basic list and summary reports using the REPORT procedure
  • use DEFINE statements to alter the purpose of a variable in reports produced by the REPORT procedure
  • change the format, width and spacing of columns in reports produced by the REPORT procedure
  • specify the justification of the columns in reports produced by the REPORT procedure
  • change the column headers of variables appearing in a DEFINE statement
  • distinguish between the various usage options (display, order, group, across, analysis, and computed) of variables appearing in a DEFINE statement
  • create summary reports using the REPORT procedure
  • request that various statistics be included in your reports produced by the REPORT procedure

10.1 - Basic List Reports

10.1 - Basic List Reports

In this section, we'll learn how to create three different basic list reports. In the first example, we'll create a basic list report of an entire SAS data set. In the second example, we'll create a basic list report of part of a SAS data set by selecting just a subset of the variables contained in the data set. In the last example, we'll create a basic list report of part of a SAS data set by selecting just those observations in the data set that meet a certain condition.

Example 10.2

The following SAS program creates a basic list report of the entire permanent SAS data set called stat480.penngolf:

LIBNAME stat480 'C:\simon\stat480wc\fa08\10report\sasndata\';

PROC REPORT data = stat480.penngolf NOWINDOWS;
   title 'Some Pennsylvania Golf Courses';
RUN;
Some Pennsylvania Golf Courses
ID Name Architect Year Type Par Yards Slope USGA
101 Toftrees Ed Ault 1968 Resort 72 7018 134 74.3
102 Penn State Blue William Park, Jr. 1921 Public 72 6525 128 72.0
103 Centre Hills Alex Findlay 1921 Private 71 6392 128 71.2
104 Lewistown CC   . Private 72 6779 125 72.3
105 State College Elks Lowell Erdman 1973 SemiPri 71 6369 123 70.9
106 Park Hills CC James Harrison 1966 SemiPri 70 6004 126 69.3
107 Sinking Valley CC Ad Ault 1967 SemiPri 72 6755 132 73.4
108 Williamsport CC A.W Tillinghast 1909 Private 71 6489 131 71.9
109 Standing Stone CC Geoffrey Cornish 1973 SemiPri 70 6593 120 71.4
110 Bucknell GC   1960 SemiPri 70 6253 132 70.0
111 Mount Airy Lodge Hal Purdy 1972 Resort 72 7123 140 74.3

The REPORT procedure contained in this code represents the most basic form of the procedure, in which the data set is specified and the NOWINDOWS option is used. Of course, you have to tell SAS the data set whose contents you want displayed. The NOWINDOWS option — also known as the NOWD option — tells SAS to display the report in the output window. If you don't specify the NOWINDOWS option, then SAS uses the WINDOWS option by default, which tells SAS to display the report in a special REPORT window (note, this has changed in the latest version of SAS and NOWINDOWS is now the default).

If you've already followed the directions in Example 10.1, and have thus run the program there, then you can simply launch this program, delete the LIBNAME statement, run the SAS program, and review the output from the REPORT procedure.

If you haven't already followed the directions in Example 10.1, you'll need to download the stat480.penngolf data set by clicking on the link. After you've saved the data set in a convenient location on your computer, launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the stat480.penngolf data set. Then, run the SAS program, so that you have access to the data set throughout the lesson. Then, review the output from the REPORT procedure.

Upon reviewing the output, you should notice that all eleven of the observations and all nine of the variables contained in the stat480.penngolf data set are printed. As, you can see, by default, the variables appear in the order in which they occur in the data set. Oh, you might also note that the report continues across two pages, because SAS can't fit it in one page as the PRINT procedure can. This has to do with the default width SAS allocates to each column. This is a column attribute that we'll learn how to modify in the next section.

Example 10.3

Rather than telling SAS to display the entire stat480.penngolf data set, the following SAS program uses the COLUMN statement to tell SAS to display just five of the columns — Name, Year, Type, Par and Yards — in the order specified:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
RUN;
Some Pennsylvania Golf Courses
Name Year Type Par Yards
Toftrees 1968 Resort 72 7018
Penn State Blue 1921 Public 72 6525
Centre Hills 1921 Private 71 6392
Lewistown CC . Private 72 6779
State College Elks 1973 SemiPri 71 6369
Park Hills CC 1966 SemiPri 70 6004
Sinking Valley CC 1967 SemiPri 72 6755
Williamsport CC 1909 Private 71 6489
Standing Stone CC 1973 SemiPri 70 6593
Bucknell GC 1960 SemiPri 70 6253
Mount Airy Lodge 1972 Resort 72 7123

Launch and run the SAS program, and review the output to convince yourself that just the requested subset of the data set has been displayed in the report. Oh, you might also want to take note of the effect of the HEADLINE option. As you can see, the HEADLINE option tells SAS to underline all of the column headings, as well as the spaces between the columns. It's a nice way to "prettify" your output!

Example 10.4

Again, rather than telling SAS to display the entire stat480.penngolf data set, the following SAS program uses:

  • the COLUMN statement to tell SAS to display just five of the columns — Name, Year, Type, Par and Yards — in the order specified, and
  • the WHERE statement to tell SAS to display only those golf courses whose Type equals Private or Resort.

Here's the program:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADSKIP;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 where Type in ('Private', 'Resort');
RUN;
Some Pennsylvania Golf Courses
Name Year Type Par Yards
Toftrees 1968 Resort 72 7018
Centre Hills 1921 Private 71 6392
Lewistown CC . Private 72 6779
Williamsport CC 1909 Private 71 6489
Mount Airy Lodge 1972 Resort 72 7123

Launch and run the SAS program, and review the output to convince yourself that just the requested subset of the data set has been displayed in the report. This time, you might want to take note of the effect of the HEADSKIP option. As you can see, the HEADSKIP option tells SAS to write a blank line beneath all of the column headings, as well as the spaces between the columns. If you use the HEADSKIP option in conjunction with the HEADLINE option, SAS writes a blank line beneath the underline. You might want to add the HEADLINE option to this code and re-run it just so you can see the effect for yourself.


10.2 - Column Attributes

10.2 - Column Attributes

In the output that you've seen so far, you might have noticed that the REPORT procedure displays:

  • each data value the way that it is stored in the data set,
  • variable names as column headings in the report,
  • a default width for the report columns,
  • left-justified character values,
  • right-justified numeric values, and
  • observations in the order in which they are stored in the data set.

In this section, we'll learn how to use the DEFINE statement to tell SAS to display a variable using a certain format, specify the width of the columns, and set the number of blank spaces that should appear to the left of the columns in a report. To tell SAS to display a variable var1 using the comma7.2 format, say, we must use the FORMAT= attribute of the DEFINE statement as follows:

DEFINE var1 / FORMAT = comma7.2;

The format specified can either be a SAS format or a user-defined format. To tell SAS to set the column width for a variable var2 at 6 spaces, say, we must use the WIDTH= attribute of the DEFINE statement as follows:

DEFINE var2 / WIDTH = 6;

The default column width is set to be just large enough to handle the specified format. To tell SAS to leave 4 blank characters between the column containing var3 and the column immediately to its left, we must use the SPACING= attribute of the DEFINE statement as follows:

DEFINE var3 / SPACING = 4;

By default, SAS leaves 2 blank characters to the left of each column. Let's take a look at some examples!

Example 10.5

The following SAS program uses the FORMAT= attribute to tell SAS to display, when creating a report using the stat480.penngolf data set, the Yards variable using the SAS format comma5.0:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0;
RUN;
Some Pennsylvania Golf Courses
Name Year Type Par Yards
Toftrees 1968 Resort 72 7,018
Penn State Blue 1921 Public 72 6,525
Centre Hills 1921 Private 71 6,392
Lewistown CC . Private 72 6,779
State College Elks 1973 SemiPri 71 6,369
Park Hills CC 1966 SemiPri 70 6,004
Sinking Valley CC 1967 SemiPri 72 6,755
Williamsport CC 1909 Private 71 6,489
Standing Stone CC 1973 SemiPri 70 6,593
Bucknell GC 1960 SemiPri 70 6,253
Mount Airy Lodge 1972 Resort 72 7,123

Launch and run the SAS program, and review the output to convince yourself that SAS does indeed display the Yards variable using the comma5.0 format. Incidentally, if you do not specify a format for a variable within a REPORT procedure, SAS displays the variable using the format that is stored in the data set. If no format is stored in the data set, then SAS uses the default format for that variable type.

Oh, one more thing. You can, by the way, also use FORMAT statements within a REPORT procedure to specify a variable's format. As we'll soon see, the DEFINE statement allows you, however, to specify more than one column attribute at a time. You can also use the DEFINE statement's FORMAT= attribute to specify the format of report columns that are not variables actually contained in your data set (such as the statistics or computed variables that we'll investigate later in this lesson).

Example 10.6

If a variable in the input data set doesn't have a format associated with it, then the default column width in a report is set at the variable's length for character variables and 9 for numeric variables. The following SAS program illustrates what can go wrong with the reports you generate when you allow SAS to use these defaults:

DATA penngolf;
    set stat480.penngolf;
	length CourseType $ 8;
	CourseType =  Type;
	drop Type;
	format Slope 3.;
RUN;

PROC CONTENTS data = penngolf;
RUN;

PROC REPORT data = penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year CourseType Slope Par Yards;
	 define Yards / format = comma5.0;
RUN;

The DATA step creates a temporary SAS data set called penngolf using the permanent stat480.penngolf data set. The LENGTH statement tells SAS to create a new character variable called CourseType that is 8 characters long. An assignment statement is then used to assign the values of the Type variable to the new CourseType variable. (Seems like a silly DATA step so far, eh? You'll see why we're doing this in a minute.) The DROP statement tells SAS to then drop the Type variable from the data set as we no longer need it. The FORMAT statement assigns the numeric 3. format to the Slope variable.

Now, if you launch and run the SAS program, you can see first that the output from the CONTENTS procedure:

10-6 SAS output 1

confirms that the length of the character variable CourseType is 8, the format of the numeric variable Slope is 3., and the remaining five numeric variables are not associated with a format. Here's what the report that the REPORT procedure generates looks like:

10-6 output 2

Ooopppps!! We've got a little problem with the CourseType and Slope column headings. What went wrong here? To answer that question, we have to review how SAS sets the default column widths. Slope is a numeric variable with a numeric format of 3. By default, SAS sets the column width of a numeric variable to be just large enough to handle the specified format. Thus, SAS sets the column width for the Slope variable to just 3 spaces wide, which is clearly not enough space for the column heading. Now, CourseType is a character variable with a length of 8 characters. By default, SAS sets the column width of a character variable to be the length of the character variable. Thus, SAS sets the column width for the CourseType variable to just 8 spaces wide, which is again not enough space for the column heading. (Now, you can see why I wanted to illustrate this example with a long variable name like CourseType rather than the shorter name Type.) Fortunately, we can solve our problem by using the DEFINE statement's WIDTH= attribute.

Example 10.7

The following SAS program modifies the REPORT procedure of the previous example, so that the width of the CourseType and Slope columns are set to 10 and 5, respectively:

PROC REPORT data = penngolf NOWINDOWS HEADLINE;
	title 'Some Pennsylvania Golf Courses';
	column Name Year CourseType Slope Par Yards;
	define Yards / format = comma5.0;
	define CourseType / width = 10;
	define Slope / width = 5;
RUN;

10-7 output

Launch and run the SAS program, and review the output to convince yourself that the widths of the CourseType and Slope columns are now set to be large enough to accommodate the column headings. Incidentally, the WIDTH= attribute can handle any value from 1 to the value of the LINESIZE= system option.

If you look at the output from this example, you might notice that some of the columns are scrunched together more than others. For example, the CourseType column is rather close to the Year column to its left, and the Yards columns is rather close to the Par column to its left. We can use the DEFINE attribute's SPACING= attribute to change how much white space sits before each column.

Example 10.8

The following SAS program uses the DEFINE statement's SPACING= attribute to tell SAS to place 5 blank spaces before the Yards column and 6 blank spaces before the CourseType column:

PROC REPORT data = penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year CourseType Slope Par Yards;
	 define Yards / format = comma5.0 spacing = 5;
	 define CourseType / width = 10 spacing = 6;
	 define Slope / width = 5;
RUN;
10-8 output

You might first note that you can specify more than one attribute per DEFINE statement. Case in point, the DEFINE statement for Yards modifies the format and spacing for the Yards column, and the DEFINE statement for CourseType modifies the width and spacing for the CourseType column. Then, launch and run the SAS program, and review the output to convince yourself that the columns in the report are now more evenly distributed.


10.3 - Column Headings and Justification

10.3 - Column Headings and Justification

The DEFINE statement is not only useful for associating variables with formats, specifying column widths, and setting column spacings. The DEFINE statement can also be used to set the justification, and specify the headings, of your columns.

To define a column heading, you merely need to enclose the desired text of your heading in quotation marks in the DEFINE statement.

Example 10.9

The following program uses the DEFINE statement to set the headings for the Yards and Type variables, respectively, as Total Yardage and Type of Course:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0 'Total Yardage' 
                    width = 7 spacing = 4;
	 define Type / 'Type of Course' spacing = 6;
RUN;
Some Pennsylvania Golf Courses
Name Year Type of Course Par Total Yardage
Toftrees 1968 Resort 72 7,018
Penn State Blue 1921 Public 72 6,525
Centre Hills 1921 Private 71 6,392
Lewistown CC . Private 72 6,779
State College Elks 1973 SemiPri 71 6,369
Park Hills CC 1966 SemiPri 70 6,004
Sinking Valley CC 1967 SemiPri 72 6,755
Williamsport CC 1909 Private 71 6,489
Standing Stone CC 1973 SemiPri 70 6,593
Bucknell GC 1960 SemiPri 70 6,253
Mount Airy Lodge 1972 Resort 72 7,123

In many cases, you'll need to set the width of the columns so that they accommodate the headings, as is done here for the Yards variable. Launch and run the SAS program, and review the output to convince yourself that, in the resulting report, the headings of the Yards and Type variables are indeed as defined.

If you don't like the way SAS splits long headings across multiple rows, you can take control of it yourself by using a split character in the column label. When the REPORT procedure encounters the split character in a column heading, it breaks the heading and continues the heading on the next line. The split character itself does not appear in the heading. To use a split character, you can do either of the following:

  • Use the default forward slash (/) as the split character.
  • Define a split character by using the SPLIT= option in the PROC REPORT statement, and then use the split character in the column labels.

Let's take a look at another example.

Example 10.10

The following program uses the default forward slash as a split character for the defined column labels of the Yards and Type variables:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0 'Total/Yardage' 
                    width = 7 spacing = 4;
	 define Type / 'Type of/Course' spacing = 6 width = 8;
RUN;
Some Pennsylvania Golf Courses
Name Year Type of Course Par Total Yardage
Toftrees 1968 Resort 72 7,018
Penn State Blue 1921 Public 72 6,525
Centre Hills 1921 Private 71 6,392
Lewistown CC . Private 72 6,779
State College Elks 1973 SemiPri 71 6,369
Park Hills CC 1966 SemiPri 70 6,004
Sinking Valley CC 1967 SemiPri 72 6,755
Williamsport CC 1909 Private 71 6,489
Standing Stone CC 1973 SemiPri 70 6,593
Bucknell GC 1960 SemiPri 70 6,253
Mount Airy Lodge 1972 Resort 72 7,123

You might want to note that we modified the width of the Type column so that it can now accommodate the length of the column label as it is now defined to be split. Launch and run the SAS program, and review the output to convince yourself that the column labels are split as requested. How does it look? Hmmm, perhaps it might look a bit better now if we centered the Yards and Type columns. Let's go ahead and do that.

Ooops, just one more thing before we do ... you might want to first try out the other method of using a split character. To change the split character to an asterisk (*), say, let's have you attach the SPLIT = option:

split = '*'

just after HEADLINE and before the semicolon in the PROC REPORT statement in Example 10.10. Then, modify the labels so that they use the asterisk:

'Total*Yardage'

and:

'Type of*Course'

And, re-run the SAS program, and review the output to convince yourself that the modified code worked.

Okay, now back to centering columns. Recall that, by default, the REPORT procedure left-justifies character variables and right-justifies numeric variables in listing output. For each variable that you define, you can instead specify the justification option CENTER, LEFT, or RIGHT in the DEFINE statement.

Example 10.11

The following program uses the DEFINE statement's CENTER option to center the Yards and Type columns in the requested report:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
	 define Type / 'Type of/Course' spacing = 6 
                    width = 8 center;
RUN;
Some Pennsylvania Golf Courses
Name Year Type of Course Par Total Yardage
Toftrees 1968 Resort 72 7,018
Penn State Blue 1921 Public 72 6,525
Centre Hills 1921 Private 71 6,392
Lewistown CC . Private 72 6,779
State College Elks 1973 SemiPri 71 6,369
Park Hills CC 1966 SemiPri 70 6,004
Sinking Valley CC 1967 SemiPri 72 6,755
Williamsport CC 1909 Private 71 6,489
Standing Stone CC 1973 SemiPri 70 6,593
Bucknell GC 1960 SemiPri 70 6,253
Mount Airy Lodge 1972 Resort 72 7,123

Launch and run the SAS program, and review the output to convince yourself that the CENTER option centers the formatted values within the specified column width, as well as the headings that appear over the values. That is in fact how each option works. The RIGHT option right-justifies the formatted values within the specified column width, as well as the headings that appear over the values, and the LEFT option left-justifies the formatted values within the specified column width, as well as the headings that appear over the values.


10.4 - Using Order Variables

10.4 - Using Order Variables

Thus far, we have used the DEFINE statement to change the appearance of our reports, doing such things as specifiying the format of a variable, changing the width of a column, specifying column labels, and changing the justification of a column. The truth is that the DEFINE statement is even more powerful when it is used to tell SAS how you want SAS to use each variable that appears in your report. There are six possible usages — DISPLAY, ANALYSIS, ORDER, GROUP, ACROSS, and COMPUTED — which we'll investigate in the course of the next five sections of this lesson. By default, the REPORT procedure uses:

  • character variables as display variables
  • numeric variables as analysis variables, which are used to calculate the SUM statistic

Because we haven't explicitly defined any variable usages, the reports we have created thus far have contained only display and analysis variables. For example, here's the most recent report we created:

Some Pennsylvania Golf Courses
Name Year Type of Course Par Total Yardage
Toftrees 1968 Resort 72 7,018
Penn State Blue 1921 Public 72 6,525
Centre Hills 1921 Private 71 6,392
Lewistown CC . Private 72 6,779
State College Elks 1973 SemiPri 71 6,369
Park Hills CC 1966 SemiPri 70 6,004
Sinking Valley CC 1967 SemiPri 72 6,755
Williamsport CC 1909 Private 71 6,489
Standing Stone CC 1973 SemiPri 70 6,593
Bucknell GC 1960 SemiPri 70 6,253
Mount Airy Lodge 1972 Resort 72 7,123

Because the second, fourth and fifth columns, Year, Par, and Yards, are numeric variables, SAS treats them as analysis variables by default. Analysis variables are used to calculate a statistic ... by default a SUM.

Now, because the first and third columns, Name and Type, are character variables, SAS treats them as display variables by default. What impact does that have on the appearance of the report? It means three things: 1) A report that contains one or more display variables necessarily contains a detail row for each observation in the data set, 2) Each detail row contains a value for each display variable, and 3) The order of the rows in the report matches the order of the observations in the data set.

We can alter the order in which the rows appear in our report by defining an order variable (or two or three ...). An order variable orders the detail the rows in a report according to the variable's formatted values. Let's take a look at an example.

Example 10.12

Among other things, the DEFINE statements in the following REPORT procedure defines the Yards and Par variables as being ORDER variables:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Par Name Year Type Yards;
	 define Yards / order format = comma5.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
	 define Type / 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / order;
RUN;
Some Pennsylvania Golf Courses
Par Name Year Type Yards
70 Park Hills CC 1966 SemiPri 6,004
  Bucknell GC 1960 SemiPri 6,253
  Standing Stone CC 1973 SemiPri 6,593
71 State College Elks 1973 SemiPri 6,369
  Centre Hills 1921 Private 6,392
  Williamsport CC 1909 Private 6,489
72 Penn State Blue 1921 Public 6,525
  Sinking Valley CC 1967 SemiPri 6,755
  Lewistown CC . Private 6,779
  Toftrees 1968 Resort 7,018
  Mount Airy Lodge 1972 Resort 7,123

There are just two differences between this REPORT procedure and the one from the previous example:

  1. The word ORDER has been added to the DEFINE statements for the Yards and Par variables
  2. The order of the columns appearing in the COLUMN statement has been changed so that the Par variable is the first to appear in the report.

Launch and run the SAS program, and review the output. You should note that the report rows are now first in ascending order of the Par variable, and then within in each Par value the rows are in ascending order of the Yards variable. You should also note that SAS displays only the first occurrence of each value of an order variable in a set of rows that have the same value for all order variables. For example, the first three rows of output pertain to golf courses whose Par is 70, the next three rows of output pertain to golf courses whose Par is 71, and so on. One more thing ... we never told SAS to display the rows in ascending order. That's just what SAS does by default. We can change the order by using the DEFINE statement's DESCENDING option.

Example 10.13

The following program tells SAS, when creating the requested report, to display the rows in descending order of the Par variable and in ascending order of the Yards variable:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Par Name Year Type Yards;
	 define Yards / order format = comma5.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
	 define Type / 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / order descending;
RUN;
Some Pennsylvania Golf Courses
Par Name Year Type Yards
72 Penn State Blue 1921 Public 6,525
  Sinking Valley CC 1967 SemiPri 6,755
  Lewistown CC . Private 6,779
  Toftrees 1968 Resort 7,018
  Mount Airy Lodge 1972 Resort 7,123
71 State College Elks 1973 SemiPri 6,369
  Centre Hills 1921 Private 6,392
  Williamsport CC 1909 Private 6,489
70 Park Hills CC 1966 SemiPri 6,004
  Bucknell GC 1960 SemiPri 6,253
  Standing Stone CC 1973 SemiPri 6,593

Launch and run the SAS program, and review the output to convince yourself that the rows are indeed in the order as described.


10.5 - Using Group Variables

10.5 - Using Group Variables

Thus far, all of the reports that we have created have been what are called list reports, in which detail rows are displayed for each observation appearing in the data set. Well, that's not quite true ... we did create one report in which a detail row was displayed only for those observations in which Type equaled Private or Resort. Still, though, no summarization of observations took place when creating the reports that we have created thus far. Now, suppose we are interested in creating a summary report, in which the rows that appear in the report are created by summarizing the observations in the data set. For example, we might be interested in creating the following report:

Some Pennsylvania Golf Courses
Type of Course Total Par Total Yardage
Private 214 19,660
Public 72 6525
Resort 144 14,141
SemiPri 353 31,974

in which the total par and total yardage are displayed for each of the four types of golf courses.

To create such a summary report using the REPORT procedure, we need to define one or more group variables. In general, a group variable groups the detail rows in a report according to their formatted values. If a report contains one or more group variables, then SAS consolidates into one row all observations from the data set that have a unique combination of values for all of the defined group variables.

Example 10.14

Among other things, the DEFINE statements in the following REPORT procedure defines the Type variable as a group variable in order to create the summary report as illustrated above:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Type Par Yards;
	 define Type / group 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / analysis 'Total/Par';
	 define Yards / analysis format = comma6.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
RUN;
Some Pennsylvania Golf Courses
Type of Course Total Par Total Yardage
Private 214 19,660
Public 72 6525
Resort 144 14,141
SemiPri 353 31,974

Let's dissect this procedure. The COLUMN statement tells SAS that we only want to display three columns, namely Type, Par, and Yards, in that order. The first DEFINE statement tells SAS to use Type as a group variable, as well as specifies the column heading, width, spacing and justification. The second DEFINE statement tells SAS to use Par as an analysis variable, as well as specifies the column heading. And, the third DEFINE statement tells SAS to use Yards as an analysis variable, as well as specifies the column format, heading, width, spacing and justification.

Now, launch and run the SAS program, and review the output to convince yourself that SAS collapses the observations, and sums the Par and Yards variables as depicted in the summary report in the beginning of this section. You might want to recall that SAS sums the Par and Yards variables, rather than calculates their average, say, because that's how SAS handles analysis variables by default. In the next section, we'll learn how to change the default so that SAS calculates an average instead, say.

One more thing about this example. It is not necessary, of course, to define Par and Yards as analysis variables here, as SAS would use them as analysis variables by default anyway, since they are both numeric variables. If you're not convinced, delete the analysis word in the DEFINE statements for the Par and Yards variables, and then re-run the SAS program to verify that you still get the same report.

And one final comment about summary reports in general. All of the variables in a summary report must be defined as group, analysis, across, or computed variables. This is because the REPORT procedure must be able to summarize all variables across an observation in order to collapse observations. If the REPORT procedure can't create groups, it displays group variables as order variables. We'll make sure that the homework for this lesson makes this closing comment make sense.


10.6 - Specifying Statistics

10.6 - Specifying Statistics

As mentioned earlier, the default statistic for analysis variables is SUM. There may be some instances, however, in which you want to display other statistics in your reports. To do so, you merely specify your desired statistic as an attribute in the DEFINE statement. Here's a list of the statistics you can request:

Statistic Description
CSS corrected sum of squares
USS uncorrected sum of squares
CV coefficient of variation
MAX maximum value
MEAN average value
MIN minimum value
N number of observations with nonmissing values
NMISS number of observations with missing values
RANGE range of values
STD standard deviation
STDERR standard error of the mean
SUM sum of the values
SUMWGT sum of the weight variable values
PCTN percentage of cell or row frequency to total frequency
PCTSUM percentage of cell or row sum to total sum
VAR variance of the values
T student's t-statistic for testing population mean is 0
PRT probability of a greater absolute value of student's t

Let's take a look at an example in which the mean statistic is requested.

Example 10.15

The following REPORT procedure creates a report in which the average par and average yardage is reported for each of the four types of golf courses:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Average Size of Some PA Golf Courses by Type';
     column Type Par Yards;
	 define Type / group 'Type of/Course' spacing = 6 
                    width = 8;
	 define Par /  mean format= 4.1 
                   'Average/Par' width = 7 center;
	 define Yards /  mean format = comma6.0 'Average/Yardage' 
                    width = 7 spacing = 4 center;
RUN;
 
Some Pennsylvania Golf Courses
Type of Course Average Par Average Yardage
Private 71.3 6,553
Public 72.0 6,525
Resort 72.0 7,071
SemiPri 70.6 6,395

The COLUMN statement tells SAS that we only want to display three columns, namely Type, Par, and Yards, in that order. The first DEFINE statement tells SAS to use Type as a group variable, as well as specifies the column heading, spacing, and width. The mean that is present in the second DEFINE statement tells SAS to calculate the average Par for each Type of golf course. The second DEFINE statement also tells SAS how to format the result, as well as how to label, justify and set the width of the Par column. The mean that is present in the third DEFINE statement tells SAS to calculate the average Yards for each Type of golf course. The third DEFINE statement also tells SAS how to format the result, as well as how to label, justify and set the width and spacing of the Yards column.

Now, launch and run the SAS program, and review the output to convince yourself that SAS collapses the observations, and in so doing, calculates the averages of the Par and Yards variables for each Type of golf course.

Example 10.16

The following example illustrates the type of one-line report you get when the columns of your report contain only (numeric) analysis variables:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Size of Some PA Golf Courses';
     column Par Yards;
	 define Par /  mean format= 4.1 
                   'Average/Par' width = 7 center;
	 define Yards / format = comma7.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
RUN;
Size of Some Pennsylvania Golf Courses
Average Par Total Yardage
71.2 72,300

First, note that the COLUMN statement contains just two variables, Par and Yards, and that both are numeric variables. The first DEFINE statement tells SAS to calculate the average Par, as well as how to format the result and label, justify and set the width of the column. The second DEFINE statement tells SAS to calculate the total yards, as well as how to format the result and label, justify and set the width and spacing of the column.

Now, launch and run the SAS program, and review the output to convince yourself that SAS collapses all of the observations, and in so doing, calculates the average Par and the total Yards of all of the golf courses. It is in this way that we end up with just a one-line report.


10.7 - Using Across Variables

10.7 - Using Across Variables

Thus far, we've investigated display, analysis, order, and group variables. You can also define variables as across variables, which are functionally similar to group variables. However, when you define a variable as being an across variable, SAS displays the groups that it creates horizontally rather than vertically.

Example 10.17

The following program is nearly identical to the program in Example 10.14 which concerned the illustration of the use of a group variable. The only difference between the two programs is that the group keyword in the DEFINE statement for the Type variable has been replaced here with the across keyword:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Type Par Yards;
	 define Type / across 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / analysis 'Total/Par';
	 define Yards / analysis format = comma6.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
RUN;
Some Pennsylvania Golf Courses
Private Public Resort SemiPri Total Par Total Yardage
3 1 2 5 783 72,300

You might want to recall that when the group keyword was used in the DEFINE statement for the Type variable, the resulting report looked like this:

Some Pennsylvania Golf Courses
Type of Course Total Par Total Yardage
Private 214 19,660
Public 72 6,525
Resort 144 14,141
SemiPri 353 31,974

Now, launch and run the SAS program, and review the output to see the effect of replacing the group keyword with the across keyword. You should first note that, as claimed, the values of the Type variable are displayed horizontally rather than vertically. Then, you should observe that for the across variable, the table cells contain a frequency count for each Type of golf course. That is, the report indicates that the data set contains three private courses, one public course, two resort courses, and five semi-private courses. For the two analysis variables, Par and Yards, SAS displays the sum of all of their values. Quite a different report, eh?


10.8 - Using Computed Variables

10.8 - Using Computed Variables

We have just one more type of variable usage, namely that of computed variables, to discuss. Computed variables are numeric or character variables that you define for the report. They are not in your input data set, nor does the REPORT procedure add the them to the input data set. That is, they exist only for the purpose of displaying them in your report.

Here's how we'll go about adding a computed variable to our reports:

  1. Specify the computed variable name in the COLUMN statement.
  2. Define the variable's usage as COMPUTED in the DEFINE statement.
  3. Compute the value of the variable in a compute block that is associated with the variable.

Let's take a look at an example!

Example 10.18

The following REPORT procedure uses the Slope and USGA scores to compute a Bogey rating for each golf course that appears in the stat480.penngolf data set:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Slope USGA Bogey;
	 define Bogey / computed 'Bogey/Rating' format = 7.3;
	 define USGA / format = 4.1 spacing = 5;
	 compute Bogey;
	   Bogey = 0.186*Slope.sum + USGA.sum;
	 endcomp;
RUN;
Some Pennsylvania Golf Courses
Name Slope USGA Bogey Rating
Toftrees 134 74.3 99.224
Penn State Blue 128 72.0 95.808
Centre Hills 128 71.2 95.008
Lewistown CC 125 72.3 95.550
State College Elks 123 70.9 93.778
Park Hills CC 126 69.3 92.736
Sinking Valley CC 132 73.4 97.952
Williamsport CC 131 71.9 96.266
Standing Stone CC 120 71.4 93.720
Bucknell GC 132 70.0 94.552
Mount Airy Lodge 140 74.3 100.340

Let's review the procedure. The COLUMN statement tells SAS that we'd like four columns to appear in our report, namely the currently existing Name, Slope, and USGA variables and the yet to be computed Bogey variable. The first DEFINE statement tells SAS that Bogey is a computed variable, and specifies how to format the result and label the column. The second DEFINE statement merely tells SAS how to format and space the USGA column. Then, comes the compute block.

As you can see, the compute block begins with a COMPUTE statement containing the name of the variable to be computed, namely Bogey. Then, we use a basic assignment statement to calculate the value of Bogey. Well, okay, it doesn't look quite the same as an assignment statement that you'd expect to see in a DATA statement. Here, we have to use a compound name that identifies both the original variable and the statistic that the REPORT procedure now calculates from it. In general, the compound name has the form variable-name.statistic, and hence why we refer to Slope.sum and USGA.sum. Finally, we close the compute block with an ENDCOMP statement.

Now, let's have you go ahead and launch and run the SAS program, and review the output so that you can convince yourself that Bogey was calculated as described.

Incidentally, the position of a computed variable in the COLUMN statement is critically important. The REPORT procedure assigns values to the columns in a row of a report from left to right. Therefore, you can't base the calculation of a computed variable that appears to its right in the report. To convince yourself of this, move the Bogey variable to the first position in the COLUMN statement, and re-run the SAS program. Doesn't work now, does it?


10.9 - Summary

10.9 - Summary

In this lesson, we've learned how to use the REPORT procedure to create detailed list reports, as well as a variety of summary reports.

The homework for this lesson will give you more practice with the REPORT procedure so that you become even more familiar with how it works and can use it in your own SAS programming.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility