13.4 - Detecting the End of a SAS Data Set

Now that we've spent some time identifying the first and last observations among a group of observations, now we'll turn our attention to identifying the last observation in a data set. You might find yourself in situations in which you need to determine when the last observation in an input data set has been read, so that you can then tell SAS to take some action.

In this section, we'll investigate how to use the SET statement's END= option to tell SAS to create a temporary numeric value whose value is used to detect the last observation. We'll start first by taking a look at a program in which we would benefit from knowing the last observation.

Example 13.15 Section

The following program uses an accumulator variable called TotalSales to determine the overall total Sales amount in the sales data set:

LIBNAME stat481 'C:\yourdrivename\Stat481WC\01sasdata\sasndata';
DATA storesales;
	set stat481.sales;
	TotalSales + Sales;
	format Sales TotalSales Dollar13.2;
RUN;
 
PROC PRINT data = storesales NOOBS;
	title;
RUN;

Store

Dept

Quarter

Sales

TotalSales

101

10

1

$110,001.50

$110,001.50

101

10

2

$113,101.20

$223,102.70

101

10

3

$111,932.15

$335,034.85

101

10

4

$99,901.10

$434,935.95

101

20

1

$110,002.36

$544,938.31

101

20

2

$99,922.39

$644,860.70

101

20

3

$98,832.98

$743,693.68

101

20

4

$110,101.70

$853,795.38

121

20

1

$121,947.10

$975,742.48

121

20

2

$119,964.69

$1,095,707.17

121

20

3

$122,136.28

$1,217,843.45

121

20

4

$120,111.11

$1,337,954.56

121

10

1

$127,192.92

$1,465,147.48

121

10

2

$125,280.13

$1,590,427.61

121

10

3

$128,203.56

$1,718,631.17

121

10

4

$123,632.29

$1,842,263.46

109

10

1

$120,422.77

$1,962,686.23

109

10

2

$123,984.32

$2,086,670.55

109

10

3

$121,801.29

$2,208,471.84

109

10

4

$122,125.66

$2,330,597.50

109

30

1

$98,310.13

$2,428,907.63

109

30

2

$97,331.25

$2,526,238.88

109

30

3

$96,386.28

$2,622,625.16

109

30

4

$98,511.90

$2,721,137.06

109

20

1

$115,239.09

$2,836,376.15

109

20

2

$113,001.98

$2,949,378.13

109

20

3

$114,234.32

$3,063,612.45

109

20

4

$114,122.65

$3,177,735.10

If you haven't already done so, download the sales data set, and save it to a convenient location on your computer. Launch the SAS program and edit the LIBNAME statement to reflect the location in which you saved the data set. Then, run  the program, and review the output to convince yourself that the value of the TotalSales variable for each observation is the sum of the Sales values for the observation and all of the observations that precede it.

In this case, we ended up with a data set called storesales whose TotalSales variable contains a running total of the Sales values. Now, suppose instead we wanted a data set called storesales which contains only the last observation with the overall total of all Sales. The SET statement's END= option can help us select the last observation.

Example 13.16 Section

The following program uses the SET statement's END= option and a subsetting IF statement to tell SAS to write only the last observation in the input data set (stat481.sales) to the output data set (storesales):

LIBNAME stat481 'C:\yourdrivename\Stat481WC\01sasdata\sasndata';
DATA storesales;
	set stat481.sales end=last;
	TotalSales + Sales;
	format Sales TotalSales Dollar13.2;
	drop Store Dept Sales Quarter;
	if last;
RUN;

PROC PRINT data = storesales NOOBS;
	title;
RUN;

TotalSales

$3,177,735.10

The END=last option tells SAS to create a temporary numeric variable called last, which is initialized to 0 and set to 1 only when the SET statement reads the last observation in the input data set. Although we used the variable name last here, we could have used any valid SAS variable name. The variable is temporary in that it is placed in the program data vector but not written to the output data set.

The subsetting IF statement:

if last;

is the statement that tells SAS to include in the output data set only the last observation in the input data set. The DROP statement tells SAS to drop the then meaningless Store, Dept, Quarter, and Sales variables from the output data set.

If you haven't already done so, download the sales data set, and save it to a convenient location on your computer. Launch the SAS program and edit the LIBNAME statement to reflect the location in which you saved the data set. Then, run  the program, and review the output to convince yourself that the storesales data set contains just one observation and one variable containing the overall total sales (TotalSales) of all of the stores.