19.2 - Processing of Arrays

It is helpful, as is often the case when learning a new SAS tool, to take a look at the inner workings of the compile and execution phases of a DATA step that involves an array. That's what we'll do in this section. Specifically, we'll revisit Example 19.2, but this time our focus will be on how SAS processes the DATA step.

Example 19.6 Section

The following program is identical to the program in Example 19.2. That is, the program uses a one-dimensional array called fahr to convert the average Celsius temperatures in the avgcelsius data set to average Fahrenheit temperatures stored in a new data set called avgfahrenheit:

DATA avgfahrenheit;
        	set avgcelsius;
            array fahr(12) jan feb mar apr may jun
                            jul aug sep oct nov dec;
            do i = 1 to 12;
                    fahr(i) = 1.8*fahr(i) + 32;  	
            end;
        RUN;
        PROC PRINT data = avgfahrenheit;
            title 'Average Monthly Temperatures in Fahrenheit';
            id City;
            var jan feb mar apr may jun 
                jul aug sep oct nov dec;
        RUN;

Average Monthly Temperatures in Fahrenheit
Cityjanfebmaraprmayjunjulaugsepoctnovdec
State College, PA28.428.435.646.457.266.269.868.060.850.039.230.2
Miami, FL68.068.071.673.478.880.682.482.480.678.873.468.0
St. Louis, MO30.233.842.855.464.473.478.877.069.859.044.633.8
New Orleans, LA51.855.460.868.073.480.680.680.678.869.860.853.6
Madison, WI17.623.032.044.657.266.271.668.060.850.035.623.0
Houston, TX50.053.660.868.073.480.682.482.478.869.860.853.6
Phoenix, AZ53.657.260.869.878.887.891.489.686.073.460.853.6
Seattle, WA41.042.844.650.055.460.864.464.460.853.646.442.8
San Francisco, CA50.053.653.655.457.259.059.060.862.660.857.251.8
San Diego, CA55.457.259.060.862.666.269.871.669.866.260.857.2

As always, at the end of the compile phase, SAS will have created a program data vector containing the automatic variables ( _N_ and _ERROR_), the variables from the input data set avgcelsius (that is, City, jan, feb, ..., dec), and any newly created variables in the DATA step (the DO loop's index variable i). At the end of the compile phase, this is what (an abbreviated version of ) the program data vector looks like:

_N__ERROR_Cityjanfebmar...novdeci
10 .........

Note that the array name and array references are not included in the program data vector, as they exist only for the duration of the DATA step. During the first iteration of the DATA step, the first observation in the avgcelsius data set is read into the program data vector:

_N__ERROR_Cityjanfebmar...novdeci
10State College, PA-2-22...104.

Because the ARRAY statement is a compile-time-only statement, it is ignored during execution. The DO loop is executed next. During the first iteration of the DO loop, the index variable i is set to 1. As a result, the array reference fahr(i) becomes fahr(1). Because fahr(1) refers to the first array element, jan, the value of jan in the program data vector is converted from Celsius to Fahrenheit:

_N__ERROR_Cityjanfebmar...novdeci
10State College, PA28.4-22...1041

During the second iteration of the DO loop, the index variable i is set to 2. As a result, the array reference fahr(i) becomes fahr(2). Because fahr(2) refers to the second array element, feb, the value of feb in the program data vector is converted from Celsius to Fahrenheit:

_N__ERROR_Cityjanfebmar...novdeci
10State College, PA28.428.42...1042

During the third iteration of the DO loop, the index variable i is set to 3. As a result, the array reference fahr(i) becomes fahr(3). Because fahr(3) refers to the third array element, mar, the value of mar in the program data vector is converted from Celsius to Fahrenheit:

_N__ERROR_Cityjanfebmar...novdeci
10State College, PA28.428.435.6...1043

SAS continues to process through the DO loop. During the eleventh iteration of the DO loop, the index variable i is set to 11. As a result, the array reference fahr(i) becomes fahr(11). Because fahr(11) refers to the eleventh array element, nov, the value of nov in the program data vector is converted from Celsius to Fahrenheit:

_N__ERROR_Cityjanfebmar...novdeci
10State College, PA28.428.435.6...39.2411

And, during the twelfth iteration of the DO loop, the index variable i is set to 12. As a result, the array reference fahr(i) becomes fahr(12). Because fahr(12) refers to the twelfth array element, dec, the value of dec in the program data vector is converted from Celsius to Fahrenheit:

_N__ERROR_Cityjanfebmar...novdeci
10State College, PA28.428.435.6...39.230.212

SAS then increases the value of the index variable i to 13:

_N__ERROR_Cityjanfebmar...novdeci
10State College, PA28.428.435.6...39.230.213

and steps out of the DO loop because its stop value is 12. Having arrived at the end of the DATA step, SAS writes the contents of the program data vector as the first observation in the output data set avgfahrenheit. SAS returns to the top of the DATA step and begins the process all over again for the second observation in the avgcelsius data set. The process proceeds as described until SAS runs out of observations to process in the avgcelsius data set.