6.5 - Column Totals

6.5 - Column Totals

There may be situations in which you want SAS to calculate and present column totals for some of the numeric variables appearing in your reports. In that case, you'll want to take advantage of the SUM statement. We'll investigate the use of the statement here.

Example 6.11

The following SAS code uses the PRINT procedure's SUM statement to generate a report of the total number of visits (no_vis) for patients undergoing physical therapy (type_vis = 190):

PROC PRINT data = basic;
   id name;
   var clinic no_vis;
   where type_vis = 190;
   sum no_vis;
RUN;
name clinic no_vis
Thomas Jones ALTO 10
Benedictine Arnold ALTO 1
Felicia Ho MNMC 7
Jane Smiley MNMC 5
======
    23

The ID statement tells SAS to suppress the observation number and to place the variable name in the first column of the output. The WHERE statement tells SAS to print only the observations pertaining to a physical therapy appointment (type_vis = 190). The SUM statement tells SAS to provide the total number of visits (no_vis) the patients undergoing physical therapy had.

Launch and run the SAS program, and review the output to convince yourself that the report is generated as described.

Incidentally, when you specify a variable in a SUM statement, the variable is also implicitly assumed to be present in the VAR statement. This prevents you from having to type the variable names twice, once in the VAR statement and once in the SUM statement. You might want to delete no_vis in the VAR statement, and re-run the SAS program to convince yourself that no_vis is still printed in the report, because its name appears in the SUM statement.

Example 6.12

There may be situations in which you want not just column totals, but also column subtotals. Using the PRINT procedure's BY statement, you can tell SAS to print observations in groups based on the values of the different BY variables. When a SUM statement is specified in the presence of a BY statement, SAS produces subtotals each time the value of a BY variable changes.

The following SAS program illustrates use of the BY statement in conjunction with the SUM statement to print the data in our basic data set in three groups based on the value of clinic, as well as display the total expense for each of the three groups separately:

PROC SORT data = basic out = srtd_basic;
  by clinic;
RUN;

PROC PRINT data = srtd_basic;
   by clinic;
   var subj name no_vis type_vis expense;
   sum expense;
RUN;
----------------------------------------- clinic=ALTO ----------------------------------------
Obs subj name no_vis type_vis expense
1 1168 Thomas Jones 10 190 3904.89
2 1201 Benedictine Arnold 1 190 1450.23
------ ------
clinic 5355.12
----------------------------------------- clinic=LEWN ----------------------------------------
Obs subj name no_vis type_vis expense
3 1024 Alice Smith 7 101 1001.98
4 1167 Maryann White 2 101 2999.34
------ ------
clinic 4001.32
----------------------------------------- clinic=MNMC ----------------------------------------
Obs subj name no_vis type_vis expense
5 1302 Felicia Ho 7 190 1209.94
6 1471 John Smith 6 187 1763.09
7 1980 Jane Smiley 5 190 3567.00
------ ------
clinic 6540.03
======
15896.47

As you'll see is always the case, whenever a BY statement is used in any DATA step or procedure, the data set must first be sorted inorder based on the variables specified in the BY statement. If not, your program will halt the execution, and SAS will print a message in the log indicating that the data set is not properly sorted. The SORT procedure prepares the data for the PRINT procedure's BY statement by sorting the basic data set by clinic and storing the sorted data in a new data set called srtd_basic. The PRINT procedure then tells SAS to print srtd_basic in three groups (by clinic) — one for ALTO, one for LEWN, and one for MNMC. The PRINT procedure also tells SAS to sum the expenses (sum expense) for each clinic separately, as well as provide a grand total of all expenses.

Launch and run the SAS program, and review the resulting output. Your output should display the observations in three groups (ALTO, LEWN, and MNMC) and the variable expense should be added up across the subjects and then across the three groups.

Example 6.13

If you take a look at the output from the previous example, you should see that the columns don't line up across the three clinics. The UNIFORM option tells SAS to make sure the columns of data line up from one group to the next. Without the UNIFORM statement, the PRINT procedure works to fit as many variables and observations on the page as possible. As a result, printed columns can be shifted from one group to the next. In the PRINT procedure in the previous example, no UNIFORM option was specified. Therefore, since a different number of characters are needed for name for the three groups, the columns are not aligned.

The PRINT procedure in the following SAS program illustrates use of the UNIFORM option to remedy this problem:

PROC SORT data = basic out = srtd_basic;
  by clinic;
RUN;

PROC PRINT data = srtd_basic UNIFORM;
   by clinic;
   var subj name no_vis type_vis expense;
   sum expense;
RUN;
----------------------------------------- clinic=ALTO ----------------------------------------
Obs subj name no_vis type_vis expense
1 1168 Thomas Jones 10 190 3904.89
2 1201 Benedictine Arnold 1 190 1450.23
------ ------
clinic 5355.12
----------------------------------------- clinic=LEWN ----------------------------------------
Obs subj name no_vis type_vis expense
3 1024 Alice Smith 7 101 1001.98
4 1167 Maryann White 2 101 2999.34
------ ------
clinic 4001.32
----------------------------------------- clinic=MNMC ----------------------------------------
Obs subj name no_vis type_vis expense
5 1302 Felicia Ho 7 190 1209.94
6 1471 John Smith 6 187 1763.09
7 1980 Jane Smiley 5 190 3567.00
------ ------
clinic 6540.03
======
15896.47

The only difference between this PRINT procedure and the previous one is that the UNIFORM option has been included here. Launch and run the SAS program, and review the resulting output to convince yourself that the columns across the three groups are now properly aligned.

Example 6.14

In the output from the previous two examples, you might have noticed that redundant information is displayed for each group. For example, the BY variable clinic is identified across the top of the data for each group, as well as for the subtotal for each group. To show the BY variable heading only once, you can use an ID statement and a BY statement in conjunction with the SUM statement. When an ID statement specifies the same variable as the BY statement:

  1. The observation number is suppressed
  2. The ID variable is printed as the first column of the report
  3. Each value of the ID variable is printed only at the start of each BY group and on the line that contains that group's subtotal.

The PRINT procedure in the following SAS program illustrates the use of the ID statement to control the headings displayed in the report for each clinic:

PROC SORT data = basic out = srtd_basic;
  by clinic;
RUN;

PROC PRINT data = srtd_basic UNIFORM;
   by clinic;
   var subj name no_vis type_vis expense;
   sum expense;
   id clinic;
RUN;
clinic subj name no_vis type_vis expense
ALTO 1168 Thomas Jones 10 190 3904.89
  1201 Benedictine Arnold 1 190 1450.23
------         ------
ALTO         5355.12
LEWN 1024 Alice Smith 7 101 1001.98
  1167 Maryann White 2 101 2999.34
------         ------
LEWN         4001.32
MNMC 1302 Felicia Ho 7 190 1209.94
  1471 John Smith 6 187 1763.09
  1980 Jane Smiley 5 190 3567.00
------         ------
MNMC         6540.03
======
15896.47

Launch and run the SAS program, and review the resulting output to convince yourself that the headings for the three clinics appear as described.

Example 6.15

Rather than having SAS display the output for each group on the same page, you can take advantage of the PAGEBY statement to tell SAS to print each group on a separate page. The following SAS program creates the same output as the previous example, except here we request that the clinics be printed on separate pages:

PROC SORT data = basic out = srtd_basic;
  by clinic;
RUN;

PROC PRINT data = srtd_basic UNIFORM;
   by clinic;
   var subj name no_vis type_vis expense;
   sum expense;
   id clinic;
   pageby clinic;
RUN;

The PAGEBY statement tells SAS to print the data for each clinic on a separate page. Note that the variable that is specified in the PAGEBY statement must also be specified in the PRINT procedure's BY statement.

Launch and run the SAS program. You should see that each clinic is represented on a separate page — the first page:

clinic subj name no_vis type_vis expense
ALTO 1168 Thomas Jones 10 190 3904.89
  1201 Benedictine Arnold 1 190 1450.23
------         ------
ALTO         5355.12

the second page:

clinic subj name no_vis type_vis expense
LEWN 1024 Alice Smith 7 101 1001.98
  1167 Maryann White 2 101 2999.34
------         ------
LEWN         4001.32

and the third page:

clinic subj name no_vis type_vis expense
MNMC 1302 Felicia Ho 7 190 1209.94
  1471 John Smith 6 187 1763.09
  1980 Jane Smiley 5 190 3567.00
------         ------
MNMC         6540.03
======
15896.47

Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility