It is helpful, as is often the case when learning a new SAS tool, to take a look at the inner workings of the compile and execution phases of a DATA step that involves an array. That's what we'll do in this section. Specifically, we'll revisit Example 19.2, but this time our focus will be on how SAS processes the DATA step.
Example 19.6 Section
The following program is identical to the program in Example 19.2. That is, the program uses a one-dimensional array called fahr to convert the average Celsius temperatures in the avgcelsius data set to average Fahrenheit temperatures stored in a new data set called avgfahrenheit:
DATA avgfahrenheit;
set avgcelsius;
array fahr(12) jan feb mar apr may jun
jul aug sep oct nov dec;
do i = 1 to 12;
fahr(i) = 1.8*fahr(i) + 32;
end;
RUN;
PROC PRINT data = avgfahrenheit;
title 'Average Monthly Temperatures in Fahrenheit';
id City;
var jan feb mar apr may jun
jul aug sep oct nov dec;
RUN;
City | jan | feb | mar | apr | may | jun | jul | aug | sep | oct | nov | dec |
---|---|---|---|---|---|---|---|---|---|---|---|---|
State College, PA | 28.4 | 28.4 | 35.6 | 46.4 | 57.2 | 66.2 | 69.8 | 68.0 | 60.8 | 50.0 | 39.2 | 30.2 |
Miami, FL | 68.0 | 68.0 | 71.6 | 73.4 | 78.8 | 80.6 | 82.4 | 82.4 | 80.6 | 78.8 | 73.4 | 68.0 |
St. Louis, MO | 30.2 | 33.8 | 42.8 | 55.4 | 64.4 | 73.4 | 78.8 | 77.0 | 69.8 | 59.0 | 44.6 | 33.8 |
New Orleans, LA | 51.8 | 55.4 | 60.8 | 68.0 | 73.4 | 80.6 | 80.6 | 80.6 | 78.8 | 69.8 | 60.8 | 53.6 |
Madison, WI | 17.6 | 23.0 | 32.0 | 44.6 | 57.2 | 66.2 | 71.6 | 68.0 | 60.8 | 50.0 | 35.6 | 23.0 |
Houston, TX | 50.0 | 53.6 | 60.8 | 68.0 | 73.4 | 80.6 | 82.4 | 82.4 | 78.8 | 69.8 | 60.8 | 53.6 |
Phoenix, AZ | 53.6 | 57.2 | 60.8 | 69.8 | 78.8 | 87.8 | 91.4 | 89.6 | 86.0 | 73.4 | 60.8 | 53.6 |
Seattle, WA | 41.0 | 42.8 | 44.6 | 50.0 | 55.4 | 60.8 | 64.4 | 64.4 | 60.8 | 53.6 | 46.4 | 42.8 |
San Francisco, CA | 50.0 | 53.6 | 53.6 | 55.4 | 57.2 | 59.0 | 59.0 | 60.8 | 62.6 | 60.8 | 57.2 | 51.8 |
San Diego, CA | 55.4 | 57.2 | 59.0 | 60.8 | 62.6 | 66.2 | 69.8 | 71.6 | 69.8 | 66.2 | 60.8 | 57.2 |
As always, at the end of the compile phase, SAS will have created a program data vector containing the automatic variables ( _N_ and _ERROR_), the variables from the input data set avgcelsius (that is, City, jan, feb, ..., dec), and any newly created variables in the DATA step (the DO loop's index variable i). At the end of the compile phase, this is what (an abbreviated version of ) the program data vector looks like:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | . | . | . | ... | . | . | . |
Note that the array name and array references are not included in the program data vector, as they exist only for the duration of the DATA step. During the first iteration of the DATA step, the first observation in the avgcelsius data set is read into the program data vector:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | State College, PA | -2 | -2 | 2 | ... | 10 | 4 | . |
Because the ARRAY statement is a compile-time-only statement, it is ignored during execution. The DO loop is executed next. During the first iteration of the DO loop, the index variable i is set to 1. As a result, the array reference fahr(i) becomes fahr(1). Because fahr(1) refers to the first array element, jan, the value of jan in the program data vector is converted from Celsius to Fahrenheit:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | State College, PA | 28.4 | -2 | 2 | ... | 10 | 4 | 1 |
During the second iteration of the DO loop, the index variable i is set to 2. As a result, the array reference fahr(i) becomes fahr(2). Because fahr(2) refers to the second array element, feb, the value of feb in the program data vector is converted from Celsius to Fahrenheit:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | State College, PA | 28.4 | 28.4 | 2 | ... | 10 | 4 | 2 |
During the third iteration of the DO loop, the index variable i is set to 3. As a result, the array reference fahr(i) becomes fahr(3). Because fahr(3) refers to the third array element, mar, the value of mar in the program data vector is converted from Celsius to Fahrenheit:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | State College, PA | 28.4 | 28.4 | 35.6 | ... | 10 | 4 | 3 |
SAS continues to process through the DO loop. During the eleventh iteration of the DO loop, the index variable i is set to 11. As a result, the array reference fahr(i) becomes fahr(11). Because fahr(11) refers to the eleventh array element, nov, the value of nov in the program data vector is converted from Celsius to Fahrenheit:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | State College, PA | 28.4 | 28.4 | 35.6 | ... | 39.2 | 4 | 11 |
And, during the twelfth iteration of the DO loop, the index variable i is set to 12. As a result, the array reference fahr(i) becomes fahr(12). Because fahr(12) refers to the twelfth array element, dec, the value of dec in the program data vector is converted from Celsius to Fahrenheit:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | State College, PA | 28.4 | 28.4 | 35.6 | ... | 39.2 | 30.2 | 12 |
SAS then increases the value of the index variable i to 13:
_N_ | _ERROR_ | City | jan | feb | mar | ... | nov | dec | i |
---|---|---|---|---|---|---|---|---|---|
1 | 0 | State College, PA | 28.4 | 28.4 | 35.6 | ... | 39.2 | 30.2 | 13 |
and steps out of the DO loop because its stop value is 12. Having arrived at the end of the DATA step, SAS writes the contents of the program data vector as the first observation in the output data set avgfahrenheit. SAS returns to the top of the DATA step and begins the process all over again for the second observation in the avgcelsius data set. The process proceeds as described until SAS runs out of observations to process in the avgcelsius data set.