Lesson 10: The REPORT Procedure

Lesson 10: The REPORT Procedure

Overview

A few lessons back, we learned how to use the PRINT procedure to create basic list reports of SAS data sets. In this lesson, we'll learn how to use the REPORT procedure to create not only basic list reports but also customized list reports, as well as a broad variety of summary reports. The customization that we will investigate includes assigning formats to some of the columns in the report, defining column headings, specifying the order of the contents of our reports, as well as specifying the justification, width, and spacing of the columns. We'll also learn how to group the character variables in our reports, so that we can create summary reports that contain statistical summaries of the columns, such as subtotals, grand totals, and averages.

You should know that the REPORT procedure is capable of creating highly customized reports. We'll necessarily focus only on the most common statements and the most common enhancements. For more extensive information about the REPORT procedure features, you might want to try Carpenter's Complete Guide to the SAS REPORT Procedure.

Example 10.1

Throughout this lesson, we'll work with a permanent SAS data set called stat480.penngolf that contains information on some Pennsylvania golf courses. The following SAS program merely tells SAS to print the stat480.penngolf data set:

OPTIONS PS = 58 LS = 80 NODATE NONUMBER;
            LIBNAME stat480 'C:\Yourdrivename\stat480wc\sasndata\';
            PROC PRINT data = stat480.penngolf;
               title 'Some Pennsylvania Golf Courses';
            RUN;

Some Pennsylvania Golf Courses

Obs

ID

Name

Architect

Year

Type

Par

Yards

Slope

USGA

1

101

Toftrees

Ed Ault

1968

Resort

72

7018

134

74.3

2

102

Penn State Blue

William Park, Jr.

1921

Public

72

6525

128

72.0

3

103

Centre Hills

Alex Findlay

1921

Private

71

6392

128

71.2

4

104

Lewistown CC

 

.

Private

72

6779

125

72.3

5

105

State College Elks

Lowell Erdman

1973

SemiPri

71

6369

123

70.9

6

106

Park Hills CC

James Harrison

1966

SemiPri

70

6004

126

69.3

7

107

Sinking Valley CC

Ad Ault

1967

SemiPri

72

6755

132

73.4

8

108

Williamsport CC

A.W Tillinghast

1909

Private

71

6489

131

71.9

9

109

Standing Stone CC

Geoffrey Cornish

1973

SemiPri

70

6593

120

71.4

10

110

Bucknell GC

 

1960

SemiPri

70

6253

132

70.0

11

111

Mount Airy Lodge

Hal Purdy

1972

Resort

72

7123

140

74.3

First, download the stat480.penngolf data set by clicking on the link. After you've saved the data set in a convenient location on your computer, launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the stat480.penngolf data set. Finally, run  the SAS program, so that you have access to the data set throughout the lesson. Then, review the output from the PRINT procedure so that you can familiarize yourself with the data set.

Objectives

Upon completion of this lesson, you should be able to:

  • produce basic list and summary reports using the REPORT procedure
  • alter the purpose of a variable in reports produced by the REPORT procedure using DEFINE statements
  • change the format, width, and spacing of columns in reports produced by the REPORT procedure
  • specify the justification of the columns in reports produced by the REPORT procedure
  • change the column headers of variables appearing in a DEFINE statement
  • distinguish between the various usage options (display, order, group, across, analysis, and computed) of variables appearing in a DEFINE statement
  • create summary reports using the REPORT procedure
  • request that various statistics be included in your reports produced by the REPORT procedure

10.1 - Basic List Reports

10.1 - Basic List Reports

In this section, we'll learn how to create three different basic list reports. In the first example, we'll create a basic list report of an entire SAS data set. In the second example, we'll create a basic list report of part of a SAS data set by selecting just a subset of the variables contained in the data set. In the last example, we'll create a basic list report of part of a SAS data set by selecting just those observations in the data set that meet a certain condition.

Example 10.2

The following SAS program creates a basic list report of the entire permanent SAS data set called stat480.penngolf:

LIBNAME stat480 'C:\Yourdrivename\stat480wc\sasndata\';
PROC REPORT data = stat480.penngolf NOWINDOWS;
   title 'Some Pennsylvania Golf Courses';
RUN;

Some Pennsylvania Golf Courses

ID

Name

Architect

Year

Type

Par

Yards

Slope

USGA

101

Toftrees

Ed Ault

1968

Resort

72

7018

134

74.3

102

Penn State Blue

William Park, Jr.

1921

Public

72

6525

128

72.0

103

Centre Hills

Alex Findlay

1921

Private

71

6392

128

71.2

104

Lewistown CC

 

.

Private

72

6779

125

72.3

105

State College Elks

Lowell Erdman

1973

SemiPri

71

6369

123

70.9

106

Park Hills CC

James Harrison

1966

SemiPri

70

6004

126

69.3

107

Sinking Valley CC

Ad Ault

1967

SemiPri

72

6755

132

73.4

108

Williamsport CC

A.W Tillinghast

1909

Private

71

6489

131

71.9

109

Standing Stone CC

Geoffrey Cornish

1973

SemiPri

70

6593

120

71.4

110

Bucknell GC

 

1960

SemiPri

70

6253

132

70.0

111

Mount Airy Lodge

Hal Purdy

1972

Resort

72

7123

140

74.3

The REPORT procedure contained in this code represents the most basic form of the procedure, in which the data set is specified and the NOWINDOWS option is used. Of course, you have to tell SAS the data set whose contents you want displayed. The NOWINDOWS option — also known as the NOWD option — tells SAS to display the report in the output window. If you don't specify the NOWINDOWS option, then SAS uses the WINDOWS option by default, which tells SAS to display the report in a special REPORT window (note, this has changed in the latest version of SAS and NOWINDOWS is now the default).

If you've already followed the directions in Example 10.1, and have thus run the program there, then you can simply launch this program, delete the LIBNAME statement, run  the SAS program, and review the output from the REPORT procedure.

If you haven't already followed the directions in Example 10.1, you'll need to download the stat480.penngolf data set by clicking on the link. After you've saved the data set in a convenient location on your computer, launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the stat480.penngolf data set. Then, run  the SAS program, so that you have access to the data set throughout the lesson. Then, review the output from the REPORT procedure.

Upon reviewing the output, you should notice that all eleven of the observations and all nine of the variables contained in the stat480.penngolf data set are printed. As, you can see, by default, the variables appear in the order in which they occur in the data set. Oh, you might also note that the report continues across two pages because SAS can't fit it into one page as the PRINT procedure can. This has to do with the default width SAS allocates to each column. This is a column attribute that we'll learn how to modify in the next section.

Example 10.3

Rather than telling SAS to display the entire stat480.penngolf data set, the following SAS program uses the COLUMN statement to tell SAS to display just five of the columns — Name, Year, Type, Par, and Yards — in the order specified:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
RUN;

Some Pennsylvania Golf Courses

Name

Year

Type

Par

Yards

Toftrees

1968

Resort

72

7018

Penn State Blue

1921

Public

72

6525

Centre Hills

1921

Private

71

6392

Lewistown CC

.

Private

72

6779

State College Elks

1973

SemiPri

71

6369

Park Hills CC

1966

SemiPri

70

6004

Sinking Valley CC

1967

SemiPri

72

6755

Williamsport CC

1909

Private

71

6489

Standing Stone CC

1973

SemiPri

70

6593

Bucknell GC

1960

SemiPri

70

6253

Mount Airy Lodge

1972

Resort

72

7123

Launch and run  the SAS program, and review the output to convince yourself that just the requested subset of the data set has been displayed in the report. Oh, you might also want to take note of the effect of the HEADLINE option. As you can see, the HEADLINE option tells SAS to underline all of the column headings, as well as the spaces between the columns. It's a nice way to "prettify" your output!

Example 10.4

Again, rather than telling SAS to display the entire stat480.penngolf data set, the following SAS program uses:

  • the COLUMN statement to tell SAS to display just five of the columns — Name, Year, Type, Par, and Yards — in the order specified, and
  • the WHERE statement to tell SAS to display only those golf courses whose Type equals Private or Resort.

Here's the program:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADSKIP;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 where Type in ('Private', 'Resort');
RUN;

Some Pennsylvania Golf Courses

Name

Year

Type

Par

Yards

Toftrees

1968

Resort

72

7018

Centre Hills

1921

Private

71

6392

Lewistown CC

.

Private

72

6779

Williamsport CC

1909

Private

71

6489

Mount Airy Lodge

1972

Resort

72

7123

Launch and run  the SAS program, and review the output to convince yourself that just the requested subset of the data set has been displayed in the report. This time, you might want to take note of the effect of the HEADSKIP option. As you can see, the HEADSKIP option tells SAS to write a blank line beneath all of the column headings, as well as the spaces between the columns. If you use the HEADSKIP option in conjunction with the HEADLINE option, SAS writes a blank line beneath the underline. You might want to add the HEADLINE option to this code and re-run  it just so you can see the effect for yourself.


10.2 - Column Attributes

10.2 - Column Attributes

In the output that you've seen so far, you might have noticed that the REPORT procedure displays:

  • each data value the way that it is stored in the data set,
  • variable names as column headings in the report,
  • a default width for the report columns,
  • left-justified character values,
  • right-justified numeric values, and
  • observations in the order in which they are stored in the data set.

In this section, we'll learn how to use the DEFINE statement to tell SAS to display a variable using a certain format, specify the width of the columns, and set the number of blank spaces that should appear to the left of the columns in a report. To tell SAS to display a variable var1 using the comma7.2 format, say, we must use the FORMAT= attribute of the DEFINE statement as follows:

DEFINE var1 / FORMAT = comma7.2;

The format specified can either be a SAS format or a user-defined format. To tell SAS to set the column width for a variable var2 at 6 spaces, say, we must use the WIDTH= attribute of the DEFINE statement as follows:

DEFINE var2 / WIDTH = 6;

The default column width is set to be just large enough to handle the specified format. To tell SAS to leave 4 blank characters between the column containing var3 and the column immediately to its left, we must use the SPACING= attribute of the DEFINE statement as follows:

DEFINE var3 / SPACING = 4;

By default, SAS leaves 2 blank characters to the left of each column. Let's take a look at some examples!

Example 10.5

The following SAS program uses the FORMAT= attribute to tell SAS to display when creating a report using the stat480.penngolf data set, the Yards variable using the SAS format comma5.0:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0;
RUN;

Some Pennsylvania Golf Courses

Name

Year

Type

Par

Yards

Toftrees

1968

Resort

72

7,018

Penn State Blue

1921

Public

72

6,525

Centre Hills

1921

Private

71

6,392

Lewistown CC

.

Private

72

6,779

State College Elks

1973

SemiPri

71

6,369

Park Hills CC

1966

SemiPri

70

6,004

Sinking Valley CC

1967

SemiPri

72

6,755

Williamsport CC

1909

Private

71

6,489

Standing Stone CC

1973

SemiPri

70

6,593

Bucknell GC

1960

SemiPri

70

6,253

Mount Airy Lodge

1972

Resort

72

7,123

Launch and run  the SAS program, and review the output to convince yourself that SAS does indeed display the Yards variable using the comma5.0 format. Incidentally, if you do not specify a format for a variable within a REPORT procedure, SAS displays the variable using the format that is stored in the data set. If no format is stored in the data set, then SAS uses the default format for that variable type.

Oh, one more thing. You can, by the way, also use FORMAT statements within a REPORT procedure to specify a variable's format. As we'll soon see, the DEFINE statement allows you, however, to specify more than one column attribute at a time. You can also use the DEFINE statement's FORMAT= attribute to specify the format of report columns that are not variables actually contained in your data set (such as the statistics or computed variables that we'll investigate later in this lesson).

Example 10.6

If a variable in the input data set doesn't have a format associated with it, then the default column width in a report is set at the variable's length for character variables and 9 for numeric variables. The following SAS program illustrates what can go wrong with the reports you generate when you allow SAS to use these defaults:

DATA penngolf;
    set stat480.penngolf;
	length CourseType $ 8;
	CourseType =  Type;
	drop Type;
	format Slope 3.;
RUN;
PROC CONTENTS data = penngolf;
RUN;
PROC REPORT data = penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year CourseType Slope Par Yards;
	 define Yards / format = comma5.0;
RUN;

The DATA step creates a temporary SAS data set called penngolf using the permanent stat480.penngolf data set. The LENGTH statement tells SAS to create a new character variable called CourseType that is 8 characters long. An assignment statement is then used to assign the values of the Type variable to the new CourseType variable. (Seems like a silly DATA step so far, eh? You'll see why we're doing this in a minute.) The DROP statement tells SAS to then drop the Type variable from the data set as we no longer need it. The FORMAT statement assigns the numeric 3. format to the Slope variable.

Now, if you launch and run  the SAS program, you can see first that the output from the CONTENTS procedure:

 

Alphabetic List of Variables and Attributes

#

Variable

Type

Len

Format

3

Architect

Char

18

 

9

CourseType

Char

8

 

1

ID

Num

8

 

2

Name

Char

18

 

5

Par

Num

8

 

7

Slope

Num

8

3.

8

USGA

Num

8

 

6

Yards

Num

8

 

4

Year

Num

8

 

 

confirms that the length of the character variable CourseType is 8, the format of the numeric variable Slope is 3., and the remaining five numeric variables are not associated with a format. Here's what the report that the REPORT procedure generates looks like:

 

Some Pennsylvania Golf Courses

Name

Year

CourseType

Slope

Par

Yards

Toftrees

1968

Resort

134

72

7,018

Penn State Blue

1921

Public

128

72

6,525

Centre Hills

1921

Private

128

71

6,392

Lewistown CC

.

Private

125

72

6,779

State College Elks

1973

SemiPri

123

71

6,369

Park Hills CC

1966

SemiPri

126

70

6,004

Sinking Valley CC

1967

SemiPri

132

72

6,755

Williamsport CC

1909

Private

131

71

6,489

Standing Stone GC

1973

SemiPri

120

70

6,593

Bucknell GC

1960

SemiPri

132

70

6,253

Mount Airy Lodge

1972

Resort

140

72

7,123

 

We've got a little problem with the CourseType and Slope column headings. What went wrong here? To answer that question, we have to review how SAS sets the default column widths. Slope is a numeric variable with a numeric format of 3. By default, SAS sets the column width of a numeric variable to be just large enough to handle the specified format. Thus, SAS sets the column width for the Slope variable to just 3 spaces wide, which is clearly not enough space for the column heading. Now, CourseType is a character variable with a length of 8 characters. By default, SAS sets the column width of a character variable to be the length of the character variable. Thus, SAS sets the column width for the CourseType variable to just 8 spaces wide, which is again not enough space for the column heading. (Now, you can see why I wanted to illustrate this example with a long variable name like CourseType rather than the shorter name Type.) Fortunately, we can solve our problem by using the DEFINE statement's WIDTH= attribute.

Example 10.7

The following SAS program modifies the REPORT procedure of the previous example so that the width of the CourseType and Slope columns are set to 10 and 5, respectively:

PROC REPORT data = penngolf NOWINDOWS HEADLINE;
	title 'Some Pennsylvania Golf Courses';
	column Name Year CourseType Slope Par Yards;
	define Yards / format = comma5.0;
	define CourseType / width = 10;
	define Slope / width = 5;
RUN;

Some Pennsylvania Golf Courses

Name

Year

CourseType

Slope

Par

Yards

Toftrees

1968

Resort

134

72

7,018

Penn State Blue

1921

Public

128

72

6,525

Centre Hills

1921

Private

128

71

6,392

Lewistown CC

.

Private

125

72

6,779

State College Elks

1973

SemiPri

123

71

6,369

Park Hills CC

1966

SemiPri

126

70

6,004

Sinking Valley CC

1967

SemiPri

132

72

6,755

Williamsport CC

1909

Private

131

71

6,489

Standing Stone GC

1973

SemiPri

120

70

6,593

Bucknell GC

1960

SemiPri

132

70

6,253

Mount Airy Lodge

1972

Resort

140

72

7,123

Launch and run  the SAS program, and review the output to convince yourself that the widths of the CourseType and Slope columns are now set to be large enough to accommodate the column headings. Incidentally, the WIDTH= attribute can handle any value from 1 to the value of the LINESIZE= system option.

If you look at the output from this example, you might notice that some of the columns are scrunched together more than others. For example, the CourseType column is rather close to the Year column to its left, and the Yards column is rather close to the Par column to its left. We can use the DEFINE attribute's SPACING= attribute to change how much white space sits before each column.

Example 10.8

The following SAS program uses the DEFINE statement's SPACING= attribute to tell SAS to place 5 blank spaces before the Yards column and 6 blank spaces before the CourseType column:

PROC REPORT data = penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year CourseType Slope Par Yards;
	 define Yards / format = comma5.0 spacing = 5;
	 define CourseType / width = 10 spacing = 6;
	 define Slope / width = 5;
RUN;

Some Pennsylvania Golf Courses

Name

Year

CourseType

Slope

Par

Yards

Toftrees

1968

Resort

134

72

7,018

Penn State Blue

1921

Public

128

72

6,525

Centre Hills

1921

Private

128

71

6,392

Lewistown CC

.

Private

125

72

6,779

State College Elks

1973

SemiPri

123

71

6,369

Park Hills CC

1966

SemiPri

126

70

6,004

Sinking Valley CC

1967

SemiPri

132

72

6,755

Williamsport CC

1909

Private

131

71

6,489

Standing Stone GC

1973

SemiPri

120

70

6,593

Bucknell GC

1960

SemiPri

132

70

6,253

Mount Airy Lodge

1972

Resort

140

72

7,123

You might first note that you can specify more than one attribute per DEFINE statement. Case in point, the DEFINE statement for Yards modifies the format and spacing for the Yards column, and the DEFINE statement for CourseType modifies the width and spacing for the CourseType column. Then, launch and run  the SAS program, and review the output to convince yourself that the columns in the report are now more evenly distributed.


10.3 - Column Headings and Justification

10.3 - Column Headings and Justification

The DEFINE statement is not only useful for associating variables with formats, specifying column widths, and setting column spacings. The DEFINE statement can also be used to set the justification and specify the headings, of your columns.

To define a column heading, you merely need to enclose the desired text of your heading in quotation marks in the DEFINE statement.

Example 10.9

The following program uses the DEFINE statement to set the headings for the Yards and Type variables, respectively, as Total Yardage and Type of Course:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0 'Total Yardage' 
                    width = 7 spacing = 4;
	 define Type / 'Type of Course' spacing = 6;
RUN;

Some Pennsylvania Golf Courses

Name

Year

Type of Course

Par

Total Yardage

Toftrees

1968

Resort

72

7,018

Penn State Blue

1921

Public

72

6,525

Centre Hills

1921

Private

71

6,392

Lewistown CC

.

Private

72

6,779

State College Elks

1973

SemiPri

71

6,369

Park Hills CC

1966

SemiPri

70

6,004

Sinking Valley CC

1967

SemiPri

72

6,755

Williamsport CC

1909

Private

71

6,489

Standing Stone CC

1973

SemiPri

70

6,593

Bucknell GC

1960

SemiPri

70

6,253

Mount Airy Lodge

1972

Resort

72

7,123

In many cases, you'll need to set the width of the columns so that they accommodate the headings, as is done here for the Yards variable. Launch and run  the SAS program, and review the output to convince yourself that, in the resulting report, the headings of the Yards and Type variables are indeed as defined.

If you don't like the way SAS splits long headings across multiple rows, you can take control of it yourself by using a split character in the column label. When the REPORT procedure encounters the split character in a column heading, it breaks the heading and continues the heading on the next line. The split character itself does not appear in the heading. To use a split character, you can do either of the following:

  • Use the default forward slash (/) as the split character.
  • Define a split character by using the SPLIT= option in the PROC REPORT statement, and then use the split character in the column labels.

Let's take a look at another example.

Example 10.10

The following program uses the default forward slash as a split character for the defined column labels of the Yards and Type variables:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0 'Total/Yardage' 
                    width = 7 spacing = 4;
	 define Type / 'Type of/Course' spacing = 6 width = 8;
RUN;

Some Pennsylvania Golf Courses

Name

Year

Type of
Course

Par

Total
Yardage

Toftrees

1968

Resort

72

7,018

Penn State Blue

1921

Public

72

6,525

Centre Hills

1921

Private

71

6,392

Lewistown CC

.

Private

72

6,779

State College Elks

1973

SemiPri

71

6,369

Park Hills CC

1966

SemiPri

70

6,004

Sinking Valley CC

1967

SemiPri

72

6,755

Williamsport CC

1909

Private

71

6,489

Standing Stone CC

1973

SemiPri

70

6,593

Bucknell GC

1960

SemiPri

70

6,253

Mount Airy Lodge

1972

Resort

72

7,123

You might want to note that we modified the width of the Type column so that it can now accommodate the length of the column label as it is now defined to be split. Launch and run  the SAS program, and review the output to convince yourself that the column labels are split as requested. How does it look? Hmmm, perhaps it might look a bit better now if we centered the Yards and Type columns. Let's go ahead and do that.

Just one more thing before we do ... you might want to first try out the other method of using a split character. To change the split character to an asterisk (*), say, let's have you attach the SPLIT = option:

split = '*'

just after HEADLINE and before the semicolon in the PROC REPORT statement in Example 10.10. Then, modify the labels so that they use the asterisk:

'Total*Yardage'

and:

'Type of*Course'

And, re-run  the SAS program, and review the output to convince yourself that the modified code worked.

Okay, now back to centering columns. Recall that, by default, the REPORT procedure left-justifies character variables and right-justifies numeric variables in listing output. For each variable that you define, you can instead specify the justification option CENTER, LEFT, or RIGHT in the DEFINE statement.

Example 10.11

The following program uses the DEFINE statement's CENTER option to center the Yards and Type columns in the requested report:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Year Type Par Yards;
	 define Yards / format = comma5.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
	 define Type / 'Type of/Course' spacing = 6 
                    width = 8 center;
RUN;

Some Pennsylvania Golf Courses

Name

Year

Type of
Course

Par

Total
Yardage

Toftrees

1968

Resort

72

7,018

Penn State Blue

1921

Public

72

6,525

Centre Hills

1921

Private

71

6,392

Lewistown CC

.

Private

72

6,779

State College Elks

1973

SemiPri

71

6,369

Park Hills CC

1966

SemiPri

70

6,004

Sinking Valley CC

1967

SemiPri

72

6,755

Williamsport CC

1909

Private

71

6,489

Standing Stone CC

1973

SemiPri

70

6,593

Bucknell GC

1960

SemiPri

70

6,253

Mount Airy Lodge

1972

Resort

72

7,123

Launch and run  the SAS program, and review the output to convince yourself that the CENTER option centers the formatted values within the specified column width, as well as the headings that appear over the values. That is in fact how each option works. The RIGHT option right-justifies the formatted values within the specified column width, as well as the headings that appear over the values, and the LEFT option left-justifies the formatted values within the specified column width, as well as the headings that appear over the values.


10.4 - Using Order Variables

10.4 - Using Order Variables

Thus far, we have used the DEFINE statement to change the appearance of our reports, doing such things as specifying the format of a variable, changing the width of a column, specifying column labels, and changing the justification of a column. The truth is that the DEFINE statement is even more powerful when it is used to tell SAS how you want SAS to use each variable that appears in your report. There are six possible usages — DISPLAY, ANALYSIS, ORDER, GROUP, ACROSS, and COMPUTED — which we'll investigate in the course of the next five sections of this lesson. By default, the REPORT procedure uses:

  • character variables as display variables
  • numeric variables as analysis variables, which are used to calculate the SUM statistic

Because we haven't explicitly defined any variable usages, the reports we have created thus far have contained only display and analysis variables. For example, here's the most recent report we created:

Some Pennsylvania Golf Courses

Name

Year

Type of Course

Par

Total Yardage

Toftrees

1968

Resort

72

7,018

Penn State Blue

1921

Public

72

6,525

Centre Hills

1921

Private

71

6,392

Lewistown CC

.

Private

72

6,779

State College Elks

1973

SemiPri

71

6,369

Park Hills CC

1966

SemiPri

70

6,004

Sinking Valley CC

1967

SemiPri

72

6,755

Williamsport CC

1909

Private

71

6,489

Standing Stone CC

1973

SemiPri

70

6,593

Bucknell GC

1960

SemiPri

70

6,253

Mount Airy Lodge

1972

Resort

72

7,123

Because the second, fourth, and fifth columns, Year, Par, and Yards, are numeric variables, SAS treats them as analysis variables by default. Analysis variables are used to calculate a statistic ... by default a SUM.

Now, because the first and third columns, Name and Type, are character variables, SAS treats them as display variables by default. What impact does that have on the appearance of the report? It means three things: 1) A report that contains one or more display variables necessarily contains a detail row for each observation in the data set, 2) Each detail row contains a value for each display variable, and 3) The order of the rows in the report matches the order of the observations in the data set.

We can alter the order in which the rows appear in our report by defining an order variable (or two or three ...). An order variable orders the detail of the rows in a report according to the variable's formatted values. Let's take a look at an example.

Example 10.12

Among other things, the DEFINE statements in the following REPORT procedure define the Yards and Par variables as being ORDER variables:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
         title 'Some Pennsylvania Golf Courses';
         column Par Name Year Type Yards;
    	 define Yards / order format = comma5.0 'Total/Yardage' 
                        width = 7 spacing = 4 center;
    	 define Type / 'Type of/Course' spacing = 6 
                        width = 8 center;
    	 define Par / order;
    RUN;

Some Pennsylvania Golf Courses

Par

Name

Year

Type

Yards

70

Park Hills CC

1966

SemiPri

6,004

 

Bucknell GC

1960

SemiPri

6,253

 

Standing Stone CC

1973

SemiPri

6,593

71

State College Elks

1973

SemiPri

6,369

 

Centre Hills

1921

Private

6,392

 

Williamsport CC

1909

Private

6,489

72

Penn State Blue

1921

Public

6,525

 

Sinking Valley CC

1967

SemiPri

6,755

 

Lewistown CC

.

Private

6,779

 

Toftrees

1968

Resort

7,018

 

Mount Airy Lodge

1972

Resort

7,123

There are just two differences between this REPORT procedure and the one from the previous example:

  1. The word ORDER has been added to the DEFINE statements for the Yards and Par variables
  2. The order of the columns appearing in the COLUMN statement has been changed so that the Par variable is the first to appear in the report.

Launch and run  the SAS program, and review the output. You should note that the report rows are now first in ascending order of the Par variable, and then within in each Par value the rows are in ascending order of the Yards variable. You should also note that SAS displays only the first occurrence of each value of an order variable in a set of rows that have the same value for all order variables. For example, the first three rows of output pertain to golf courses whose Par is 70, the next three rows of output pertain to golf courses whose Par is 71, and so on. One more thing ... we never told SAS to display the rows in ascending order. That's just what SAS does by default. We can change the order by using the DEFINE statement's DESCENDING option.

Example 10.13

The following program tells SAS, when creating the requested report, to display the rows in descending order of the Par variable and in ascending order of the Yards variable:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Par Name Year Type Yards;
	 define Yards / order format = comma5.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
	 define Type / 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / order descending;
RUN;

Some Pennsylvania Golf Courses

Par

Name

Year

Type

Yards

72

Penn State Blue

1921

Public

6,525

 

Sinking Valley CC

1967

SemiPri

6,755

 

Lewistown CC

.

Private

6,779

 

Toftrees

1968

Resort

7,018

 

Mount Airy Lodge

1972

Resort

7,123

71

State College Elks

1973

SemiPri

6,369

 

Centre Hills

1921

Private

6,392

 

Williamsport CC

1909

Private

6,489

70

Park Hills CC

1966

SemiPri

6,004

 

Bucknell GC

1960

SemiPri

6,253

 

Standing Stone CC

1973

SemiPri

6,593

Launch and run  the SAS program, and review the output to convince yourself that the rows are indeed in the order as described.


10.5 - Using Group Variables

10.5 - Using Group Variables

Thus far, all of the reports that we have created have been what are called list reports, in which detail rows are displayed for each observation appearing in the data set. Well, that's not quite true ... we did create one report in which a detail row was displayed only for those observations in which Type equaled Private or Resort. Still, though, no summation of observations took place when creating the reports that we have created thus far. Now, suppose we are interested in creating a summary report, in which the rows that appear in the report are created by summarizing the observations in the data set. For example, we might be interested in creating the following report:

Some Pennsylvania Golf Courses

Type of Course

Total Par

Total Yardage

Private

214

19,660

Public

72

6525

Resort

144

14,141

SemiPri

353

31,974

in which the total par and total yardage are displayed for each of the four types of golf courses.

To create such a summary report using the REPORT procedure, we need to define one or more group variables. In general, a group variable groups the detail rows in a report according to their formatted values. If a report contains one or more group variables, then SAS consolidates into one row all observations from the data set that have a unique combination of values for all of the defined group variables.

Example 10.14

Among other things, the DEFINE statements in the following REPORT procedure define the Type variable as a group variable in order to create the summary report as illustrated above:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Type Par Yards;
	 define Type / group 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / analysis 'Total/Par';
	 define Yards / analysis format = comma6.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
RUN;

Some Pennsylvania Golf Courses

Type of Course

Total Par

Total Yardage

Private

214

19,660

Public

72

6525

Resort

144

14,141

SemiPri

353

31,974

Let's dissect this procedure. The COLUMN statement tells SAS that we only want to display three columns, namely Type, Par, and Yards, in that order. The first DEFINE statement tells SAS to use Type as a group variable, as well as specifies the column heading, width, spacing, and justification. The second DEFINE statement tells SAS to use Par as an analysis variable, as well as specifies the column heading. And, the third DEFINE statement tells SAS to use Yards as an analysis variable, as well as specifies the column format, heading, width, spacing, and justification.

Now, launch and run  the SAS program, and review the output to convince yourself that SAS collapses the observations, and sums the Par and Yards variables as depicted in the summary report at the beginning of this section. You might want to recall that SAS sums the Par and Yards variables, rather than calculates their average, say, because that's how SAS handles analysis variables by default. In the next section, we'll learn how to change the default so that SAS calculates an average instead, say.

One more thing about this example. It is not necessary, of course, to define Par and Yards as analysis variables here, as SAS would use them as analysis variables by default anyway since they are both numeric variables. If you're not convinced, delete the analysis word in the DEFINE statements for the Par and Yards variables, and then re-run  the SAS program to verify that you still get the same report.

And one final comment about summary reports in general. All of the variables in a summary report must be defined as group, analysis, across, or computed variables. This is because the REPORT procedure must be able to summarize all variables across an observation in order to collapse observations. If the REPORT procedure can't create groups, it displays group variables as order variables. We'll make sure that the homework for this lesson makes this closing comment make sense.


10.6 - Specifying Statistics

10.6 - Specifying Statistics

As mentioned earlier, the default statistic for analysis variables is SUM. There may be some instances, however, in which you want to display other statistics in your reports. To do so, you merely specify your desired statistic as an attribute in the DEFINE statement. Here's a list of the statistics you can request:

Statistic

Description

CSS

corrected sum of squares

USS

uncorrected sum of squares

CV

coefficient of variation

MAX

maximum value

MEAN

average value

MIN

minimum value

N

number of observations with nonmissing values

NMISS

number of observations with missing values

RANGE

range of values

STD

standard deviation

STDERR

standard error of the mean

SUM

sum of the values

SUMWGT

sum of the weight variable values

PCTN

percentage of cell or row frequency to total frequency

PCTSUM

percentage of cell or row sum to total sum

VAR

variance of the values

T

student's t-statistic for testing population mean is 0

PRT

probability of a greater absolute value of student's t

Let's take a look at an example in which the mean statistic is requested.

Example 10.15

The following REPORT procedure creates a report in which the average par and average yardage are reported for each of the four types of golf courses:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Average Size of Some PA Golf Courses by Type';
     column Type Par Yards;
	 define Type / group 'Type of/Course' spacing = 6 
                    width = 8;
	 define Par /  mean format= 4.1 
                   'Average/Par' width = 7 center;
	 define Yards /  mean format = comma6.0 'Average/Yardage' 
                    width = 7 spacing = 4 center;
RUN;

Some Pennsylvania Golf Courses

Type of Course

Average Par

Average Yardage

Private

71.3

6,553

Public

72.0

6,525

Resort

72.0

7,071

SemiPri

70.6

6,395

The COLUMN statement tells SAS that we only want to display three columns, namely Type, Par, and Yards, in that order. The first DEFINE statement tells SAS to use Type as a group variable, as well as specifies the column heading, spacing, and width. The mean that is present in the second DEFINE statement tells SAS to calculate the average Par for each Type of golf course. The second DEFINE statement also tells SAS how to format the result, as well as how to label, justify, and set the width of the Par column. The mean that is present in the third DEFINE statement tells SAS to calculate the average Yards for each Type of golf course. The third DEFINE statement also tells SAS how to format the result, as well as how to label, justify, and set the width and spacing of the Yards column.

Now, launch and run  the SAS program, and review the output to convince yourself that SAS collapses the observations, and in so doing, calculates the averages of the Par and Yards variables for each Type of golf course.

Example 10.16

The following example illustrates the type of one-line report you get when the columns of your report contain only (numeric) analysis variables:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Size of Some PA Golf Courses';
     column Par Yards;
	 define Par /  mean format= 4.1 
                   'Average/Par' width = 7 center;
	 define Yards / format = comma7.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
RUN;

Size of Some Pennsylvania Golf Courses

Average Par

Total Yardage

71.2

72,300

First, note that the COLUMN statement contains just two variables, Par and Yards, and that both are numeric variables. The first DEFINE statement tells SAS to calculate the average Par, as well as how to format the result and label, justify, and set the width of the column. The second DEFINE statement tells SAS to calculate the total yards, as well as how to format the result and label, justify, and set the width and spacing of the column.

Now, launch and run  the SAS program, and review the output to convince yourself that SAS collapses all of the observations, and in so doing, calculates the average Par and the total Yards of all of the golf courses. It is in this way that we end up with just a one-line report.


10.7 - Using Across Variables

10.7 - Using Across Variables

Thus far, we've investigated display, analysis, order, and group variables. You can also define variables as across variables, which are functionally similar to group variables. However, when you define a variable as being an across variable, SAS displays the groups that it creates horizontally rather than vertically.

Example 10.17

The following program is nearly identical to the program in Example 10.14 which concerned the illustration of the use of a group variable. The only difference between the two programs is that the group keyword in the DEFINE statement for the Type variable has been replaced here with the across keyword:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Type Par Yards;
	 define Type / across 'Type of/Course' spacing = 6 
                    width = 8 center;
	 define Par / analysis 'Total/Par';
	 define Yards / analysis format = comma6.0 'Total/Yardage' 
                    width = 7 spacing = 4 center;
RUN;

Some Pennsylvania Golf Courses

Type of Course

Total

Private

Public

Resort

SemiPri

Par

Yardage

3

1

2

5

783

72,300

You might want to recall that when the group keyword was used in the DEFINE statement for the Type variable, the resulting report looked like this:

Some Pennsylvania Golf Courses

Type of Course

Total Par

Total Yardage

Private

214

19,660

Public

72

6,525

Resort

144

14,141

SemiPri

353

31,974

Now, launch and run  the SAS program, and review the output to see the effect of replacing the group keyword with the across keyword. You should first note that, as claimed, the values of the Type variable are displayed horizontally rather than vertically. Then, you should observe that for the across variable, the table cells contain a frequency count for each Type of golf course. That is, the report indicates that the data set contains three private courses, one public course, two resort courses, and five semi-private courses. For the two analysis variables, Par and Yards, SAS displays the sum of all of their values. Quite a different report, eh?


10.8 - Using Computed Variables

10.8 - Using Computed Variables

We have just one more type of variable usage, namely that of computed variables, to discuss. Computed variables are numeric or character variables that you define for the report. They are not in your input data set, nor does the REPORT procedure add them to the input data set. That is, they exist only for the purpose of displaying them in your report.

Here's how we'll go about adding a computed variable to our reports:

  1. Specify the computed variable name in the COLUMN statement.
  2. Define the variable's usage as COMPUTED in the DEFINE statement.
  3. Compute the value of the variable in a compute block that is associated with the variable.

Let's take a look at an example!

Example 10.18

The following REPORT procedure uses the Slope and USGA scores to compute a Bogey rating for each golf course that appears in the stat480.penngolf data set:

PROC REPORT data = stat480.penngolf NOWINDOWS HEADLINE;
     title 'Some Pennsylvania Golf Courses';
     column Name Slope USGA Bogey;
	 define Bogey / computed 'Bogey/Rating' format = 7.3;
	 define USGA / format = 4.1 spacing = 5;
	 compute Bogey;
	   Bogey = 0.186*Slope.sum + USGA.sum;
	 endcomp;
RUN;

Some Pennsylvania Golf Courses

Name

Slope

USGA

Bogey Rating

Toftrees

134

74.3

99.224

Penn State Blue

128

72.0

95.808

Centre Hills

128

71.2

95.008

Lewistown CC

125

72.3

95.550

State College Elks

123

70.9

93.778

Park Hills CC

126

69.3

92.736

Sinking Valley CC

132

73.4

97.952

Williamsport CC

131

71.9

96.266

Standing Stone CC

120

71.4

93.720

Bucknell GC

132

70.0

94.552

Mount Airy Lodge

140

74.3

100.340

Let's review the procedure. The COLUMN statement tells SAS that we'd like four columns to appear in our report, namely the currently existing Name, Slope, and USGA variables and the yet-to-be-computed Bogey variable. The first DEFINE statement tells SAS that Bogey is a computed variable, and specifies how to format the result and label the column. The second DEFINE statement merely tells SAS how to format and space the USGA column. Then, comes the compute block.

As you can see, the compute block begins with a COMPUTE statement containing the name of the variable to be computed, namely Bogey. Then, we use a basic assignment statement to calculate the value of Bogey. Well, okay, it doesn't look quite the same as an assignment statement that you'd expect to see in a DATA statement. Here, we have to use a compound name that identifies both the original variable and the statistic that the REPORT procedure now calculates from it. In general, the compound name has the form variable-name.statistic.  This is why we refer to Slope.sum and USGA.sum. Finally, we close the compute block with an ENDCOMP statement.

Now, let's have you go ahead and launch and run  the SAS program, and review the output so that you can convince yourself that Bogey was calculated as described.

Incidentally, the position of a computed variable in the COLUMN statement is critically important. The REPORT procedure assigns values to the columns in a row of a report from left to right. Therefore, you can't base the calculation of a computed variable that appears to its right in the report. To convince yourself of this, move the Bogey variable to the first position in the COLUMN statement, and re-run  the SAS program. Doesn't work now, does it?


10.9 - Summary

10.9 - Summary

In this lesson, we've learned how to use the REPORT procedure to create detailed list reports, as well as a variety of summary reports.

The homework for this lesson will give you more practice with the REPORT procedure so that you become even more familiar with how it works and can use it in your own SAS programming.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility