6.5 - Column Totals
6.5 - Column TotalsThere may be situations in which you want SAS to calculate and present column totals for some of the numeric variables appearing in your reports. In that case, you'll want to take advantage of the SUM statement. We'll investigate the use of the statement here.
Example 6.11
The following SAS code uses the PRINT procedure's SUM statement to generate a report of the total number of visits (no_vis) for patients undergoing physical therapy (type_vis = 190):
PROC PRINT data = basic;
id name;
var clinic no_vis;
where type_vis = 190;
sum no_vis;
RUN;
name | clinic | no_vis | ||
---|---|---|---|---|
Thomas Jones | ALTO | 10 | ||
Benedictine Arnold | ALTO | 1 | ||
Felicia Ho | MNMC | 7 | ||
Jane Smiley | MNMC | 5 | ||
====== | ||||
23 |
The ID statement tells SAS to suppress the observation number and to place the variable name in the first column of the output. The WHERE statement tells SAS to print only the observations pertaining to a physical therapy appointment (type_vis = 190). The SUM statement tells SAS to provide the total number of visits (no_vis) the patients undergoing physical therapy had.
Launch and run the SAS program, and review the output to convince yourself that the report is generated as described.
Incidentally, when you specify a variable in a SUM statement, the variable is also implicitly assumed to be present in the VAR statement. This prevents you from having to type the variable names twice, once in the VAR statement and once in the SUM statement. You might want to delete no_vis in the VAR statement and re-run the SAS program to convince yourself that no_vis is still printed in the report because its name appears in the SUM statement.
Example 6.12
There may be situations in which you want not just column totals, but also column subtotals. Using the PRINT procedure's BY statement, you can tell SAS to print observations in groups based on the values of the different BY variables. When a SUM statement is specified in the presence of a BY statement, SAS produces subtotals each time the value of a BY variable changes.
The following SAS program illustrates the use of the BY statement in conjunction with the SUM statement to print the data in our basic data set in three groups based on the value of clinic, as well as display the total expense for each of the three groups separately:
PROC SORT data = basic out = srtd_basic;
by clinic;
RUN;
PROC PRINT data = srtd_basic;
by clinic;
var subj name no_vis type_vis expense;
sum expense;
RUN;
---------------------------------- clinic=ALTO ----------------------------------
Obs | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
1 | 1168 | Thomas Jones | 10 | 190 | 3904.89 |
2 | 1201 | Benedictine Arnold | 1 | 190 | 1450.23 |
------ | ------ | ||||
clinic | 5355.12 |
---------------------------------- clinic=LEWN ----------------------------------
Obs | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
3 | 1024 | Alice Smith | 7 | 101 | 1001.98 |
4 | 1167 | Maryann White | 2 | 101 | 2999.34 |
------ | ------ | ||||
clinic | 4001.32 |
---------------------------------- clinic=MNMC ----------------------------------
Obs | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
5 | 1302 | Felicia Ho | 7 | 190 | 1209.94 |
6 | 1471 | John Smith | 6 | 187 | 1763.09 |
7 | 1980 | Jane Smiley | 5 | 190 | 3567.00 |
------ | ------ | ||||
clinic | 6540.03 | ||||
====== | |||||
15896.47 |
As you'll see is always the case, whenever a BY statement is used in any DATA step or procedure, the data set must first be sorted in order based on the variables specified in the BY statement. If not, your program will halt the execution, and SAS will print a message in the log indicating that the data set is not properly sorted. The SORT procedure prepares the data for the PRINT procedure's BY statement by sorting the basic data set by clinic and storing the sorted data in a new data set called srtd_basic. The PRINT procedure then tells SAS to print srtd_basic in three groups (by clinic) — one for ALTO, one for LEWN, and one for MNMC. The PRINT procedure also tells SAS to sum the expenses (sum expense) for each clinic separately, as well as provide a grand total of all expenses.
Launch and run the SAS program, and review the resulting output. Your output should display the observations in three groups (ALTO, LEWN, and MNMC) and the variable expense should be added up across the subjects and then across the three groups.
Example 6.13
If you take a look at the output from the previous example, you should see that the columns don't line up across the three clinics. The UNIFORM option tells SAS to make sure the columns of data line up from one group to the next. Without the UNIFORM statement, the PRINT procedure works to fit as many variables and observations on the page as possible. As a result, printed columns can be shifted from one group to the next. In the PRINT procedure in the previous example, no UNIFORM option was specified. Therefore, since a different number of characters are needed for name for the three groups, the columns are not aligned.
The PRINT procedure in the following SAS program illustrates use of the UNIFORM option to remedy this problem:
PROC SORT data = basic out = srtd_basic;
by clinic;
RUN;
PROC PRINT data = srtd_basic UNIFORM;
by clinic;
var subj name no_vis type_vis expense;
sum expense;
RUN;
------------------------------ clinic=ALTO ------------------------------
Obs | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
1 | 1168 | Thomas Jones | 10 | 190 | 3904.89 |
2 | 1201 | Benedictine Arnold | 1 | 190 | 1450.23 |
------ | ------ | ||||
clinic | 5355.12 |
------------------------------ clinic=LEWN ------------------------------
Obs | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
3 | 1024 | Alice Smith | 7 | 101 | 1001.98 |
4 | 1167 | Maryann White | 2 | 101 | 2999.34 |
------ | ------ | ||||
clinic | 4001.32 |
------------------------------ clinic=MNMC ------------------------------
Obs | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
5 | 1302 | Felicia Ho | 7 | 190 | 1209.94 |
6 | 1471 | John Smith | 6 | 187 | 1763.09 |
7 | 1980 | Jane Smiley | 5 | 190 | 3567.00 |
------ | ------ | ||||
clinic | 6540.03 | ||||
====== | |||||
15896.47 |
The only difference between this PRINT procedure and the previous one is that the UNIFORM option has been included here. Launch and run the SAS program, and review the resulting output to convince yourself that the columns across the three groups are now properly aligned.
Example 6.14
In the output from the previous two examples, you might have noticed that redundant information is displayed for each group. For example, the BY variable clinic is identified across the top of the data for each group, as well as for the subtotal for each group. To show the BY variable heading only once, you can use an ID statement and a BY statement in conjunction with the SUM statement. When an ID statement specifies the same variable as the BY statement:
- The observation number is suppressed
- The ID variable is printed as the first column of the report
- Each value of the ID variable is printed only at the start of each BY group and on the line that contains that group's subtotal.
The PRINT procedure in the following SAS program illustrates the use of the ID statement to control the headings displayed in the report for each clinic:
PROC SORT data = basic out = srtd_basic;
by clinic;
RUN;
PROC PRINT data = srtd_basic UNIFORM;
by clinic;
var subj name no_vis type_vis expense;
sum expense;
id clinic;
RUN;
clinic | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
ALTO | 1168 | Thomas Jones | 10 | 190 | 3904.89 |
1201 | Benedictine Arnold | 1 | 190 | 1450.23 | |
------ | ------ | ||||
ALTO | 5355.12 | ||||
LEWN | 1024 | Alice Smith | 7 | 101 | 1001.98 |
1167 | Maryann White | 2 | 101 | 2999.34 | |
------ | ------ | ||||
LEWN | 4001.32 | ||||
MNMC | 1302 | Felicia Ho | 7 | 190 | 1209.94 |
1471 | John Smith | 6 | 187 | 1763.09 | |
1980 | Jane Smiley | 5 | 190 | 3567.00 | |
------ | ------ | ||||
MNMC | 6540.03 | ||||
====== | |||||
15896.47 |
Launch and run the SAS program, and review the resulting output to convince yourself that the headings for the three clinics appear as described.
Example 6.15
Rather than having SAS display the output for each group on the same page, you can take advantage of the PAGEBY statement to tell SAS to print each group on a separate page. The following SAS program creates the same output as the previous example, except here we request that the clinics be printed on separate pages:
PROC SORT data = basic out = srtd_basic;
by clinic;
RUN;
PROC PRINT data = srtd_basic UNIFORM;
by clinic;
var subj name no_vis type_vis expense;
sum expense;
id clinic;
pageby clinic;
RUN;
The PAGEBY statement tells SAS to print the data for each clinic on a separate page. Note that the variable that is specified in the PAGEBY statement must also be specified in the PRINT procedure's BY statement.
Launch and run the SAS program. You should see that each clinic is represented on a separate page — the first page:
clinic | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
ALTO | 1168 | Thomas Jones | 10 | 190 | 3904.89 |
1201 | Benedictine Arnold | 1 | 190 | 1450.23 | |
------ | ------ | ||||
ALTO | 5355.12 |
the second page:
clinic | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
LEWN | 1024 | Alice Smith | 7 | 101 | 1001.98 |
1167 | Maryann White | 2 | 101 | 2999.34 | |
------ | ------ | ||||
LEWN | 4001.32 |
and the third page:
clinic | subj | name | no_vis | type_vis | expense |
---|---|---|---|---|---|
MNMC | 1302 | Felicia Ho | 7 | 190 | 1209.94 |
1471 | John Smith | 6 | 187 | 1763.09 | |
1980 | Jane Smiley | 5 | 190 | 3567.00 | |
------ | ------ | ||||
MNMC | 6540.03 | ||||
====== | |||||
15896.47 |