18.3 - Iteratively Processing Data
18.3 - Iteratively Processing DataSo far all of the examples that we've looked at have involved using DO loops to generate one or more observations from one iteration of the DATA step. Now, let's look at a example that involves reading a data set, and then using a DO loop to compute the value of a new variable.
Example 18.6
Every Monday morning, a credit union in Pennsylvania announces the interest rates for certificates of deposit (CDs) that it will honor for CDs opened during the business week. Suppose you want to determine how much each CD will earn at maturity with an initial investment of $5,000. The following program reads in the interest rates advertised one week in early 2009, and then uses a DO loop to calculate the value of each CD when it matures:
DATA cdinvest (drop = i);
input Type $ 1-7 AnnualRate Months;
Investment = 5000;
do i = 1 to Months;
Investment + (AnnualRate/12)*Investment;
end;
format Investment dollar8.2;
DATALINES;
03Month 0.01980 3
06Month 0.02230 6
09Month 0.02230 9
12Month 0.02470 12
18Month 0.02470 18
24Month 0.02570 24
36Month 0.02720 36
48Month 0.02960 48
60Month 0.03445 60;
RUN;
PROC PRINT data = cdinvest NOOBS;
title 'Comparison of Different CD Rates';
RUN;
Type | AnnualRate | Months | Investment |
---|---|---|---|
03Month | 0.01980 | 3 | $5024.79 |
06Month | 0.02230 | 6 | $5056.01 |
09Month | 0.02230 | 9 | $5084.25 |
12Month | 0.02470 | 12 | $5124.91 |
18Month | 0.02470 | 18 | $5188.53 |
24Month | 0.02570 | 24 | $5263.43 |
36Month | 0.02720 | 36 | $5424.61 |
48Month | 0.02960 | 48 | $5627.65 |
60Month | 0.03445 | 60 | $5938.41 |
Let's work our way through the code to see how SAS processes the first observation, say. As the INPUT statement suggests, each record in the instream data contains three pieces of information: the type of CD (Type), the annual interest rate (AnnualRate), and the time to maturity in months (Months). A new variable called Investment and the index variable i are created within the DATA step. Therefore, at the end of the compile phase, the program data vector looks like this:
_N_ | _ERROR_ | Type | AnnualRate | Months | Investment | i |
---|---|---|---|---|---|---|
1 | 0 | . | . | . | . |
In the first iteration of the DATA step, the first observation is read from the instream data, the Investment variable is initialized to 5000, and the index variable i is set to 1. At the start of the DO loop, therefore, the program data vector looks like this:
_N_ | _ERROR_ | Type | AnnualRate | Months | Investment | i |
---|---|---|---|---|---|---|
1 | 0 | 03Month | 0.0198 | 3 | 5000 | 1 |
The assignment statement tells SAS to take the current value of Investment, 5000, and add to it the amount of interest earned in one month. Because our input data set contains annual rates, we need to divide the annual rates by 12 to get monthly interest rates. The annual rate for the 3-month certificate is 0.0198, so that makes the monthly rate 0.0198 divided by 12, or 0.00165. Multiply that monthly rate, 0.00165, by the current value of Investment, 5000, and you get 8.25. So, after one month in a 3-month certificate, your 5000 dollars will have turned into 5008.25. Here's what the program data vector looks like with the updated Investment value:
_N_ | _ERROR_ | Type | AnnualRate | Months | Investment | i |
---|---|---|---|---|---|---|
1 | 0 | 03Month | 0.0198 | 3 | 5008.25 | 1 |
Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. Notice that the Months variable is used as the stop value in the DO loop. As a result, the DO loop executes the number of times that are specified by the current value of Months, which is 3. The index variable is increased to 2. Because it is not greater than 3, SAS processes the DO loop again. SAS multiplies the current value of Investment, 5008.25, by the monthly rate, 0.00165, to determine that the interest earned in the second month is 8.2636125. Therefore, after two months in a 3-month certificate, your 5000 dollars will have turned into 5008.25 + 8.2636125, or 5016.5136 dollars. Here's what the program data vector looks like at the end of the second iteration of the DO loop:
_N_ | _ERROR_ | Type | AnnualRate | Months | Investment | i |
---|---|---|---|---|---|---|
1 | 0 | 03Month | 0.0198 | 3 | 5016.5136 | 2 |
Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. The index variable is increased to 3. Because it is not greater than 3, SAS processes the DO loop again. SAS multiplies the current value of Investment, 5016.5136, by the monthly rate, 0.00165, to determine that the interest earned in the third month is 8.2772474. Therefore, after three months in a 3-month certificate, your 5000 dollars will have turned into 5016.5136 + 8.2772474, or 5024.7908 dollars. Here's what the program data vector looks like at the end of the third iteration of the DO loop:
_N_ | _ERROR_ | Type | AnnualRate | Months | Investment | i |
---|---|---|---|---|---|---|
1 | 0 | 03Month | 0.0198 | 3 | 5024.7908 | 3 |
Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. The index variable is increased to 4. Because it is greater than 3, SAS steps out of the DO loop and moves on to the next statement. Here's what the program data vector looks like now:
_N_ | _ERROR_ | Type | AnnualRate | Months | Investment | i |
---|---|---|---|---|---|---|
1 | 0 | 03Month | 0.0198 | 3 | 5024.7908 | 4 |
The FORMAT statement is not an executable statement. It is used in the compile phase to create the program data vector. Therefore, SAS has reached the end of the DATA step, and therefore writes the program data vector to create the first observation in the cdinvest data set:
Type | AnnualRate | Months | Investment |
---|---|---|---|
03Month | 0.0198 | 3 | 5024.7908 |
Because of the DROP= data set option, SAS does not write the value of the index variable i to the output data set. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that SAS created the first observation as claimed. Of course, the other observations are created just as the first one was created as described above.