# 19.1 - One-Dimensional Arrays

19.1 - One-Dimensional Arrays

A SAS array is a temporary grouping of SAS variables under a single name. For example, suppose you have four variables named winter, spring, summer, and, fall. Rather than referring to the variables by their four different names, you could associate the variables with an array name, say seasons, and refer to the variables as seasons(1), seasons(2), seasons(3), and seasons(4). When you pair an array up with an iterative DO loop, you create a powerful and efficient way of writing your computer programs. Let's take a look at an example!

## Example 19.1

The following program simply reads in the average montly temperatures (in Celsius) for ten different cities in the United States into a temporary SAS data set called avgcelsius:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

DATA avgcelsius;
input City $1-18 jan feb mar apr may jun jul aug sep oct nov dec; DATALINES; State College, PA -2 -2 2 8 14 19 21 20 16 10 4 -1 Miami, FL 20 20 22 23 26 27 28 28 27 26 23 20 St. Louis, MO -1 1 6 13 18 23 26 25 21 15 7 1 New Orleans, LA 11 13 16 20 23 27 27 27 26 21 16 12 Madison, WI -8 -5 0 7 14 19 22 20 16 10 2 -5 Houston, TX 10 12 16 20 23 27 28 28 26 21 16 12 Phoenix, AZ 12 14 16 21 26 31 33 32 30 23 16 12 Seattle, WA 5 6 7 10 13 16 18 18 16 12 8 6 San Francisco, CA 10 12 12 13 14 15 15 16 17 16 14 11 San Diego, CA 13 14 15 16 17 19 21 22 21 19 16 14 ; RUN; PROC PRINT data = avgcelsius; title 'Average Monthly Temperatures in Celsius'; id City; var jan feb mar apr may jun jul aug sep oct nov dec; RUN; Launch and run the SAS program so that the data set becomes available to you. Also, review the output from the PRINT procedure to convince yourself that the data were read in properly. Now, suppose that we don't feel particularly comfortable with understanding Celsius temperatures, and therefore, we want to convert the Celsius temperatures into Fahrenheit temperatures for which we have a better feel. The following SAS program uses the standard conversion formula: Fahrenheit temperature = 1.8*Celsius temperature + 32 to convert the Celsius temperatures in the avgcelsius data set to Fahrenheit temperatures stored in a new data set called avgfahrenheit: DATA avgfahrenheit; set avgcelsius; janf = 1.8*jan + 32; febf = 1.8*feb + 32; marf = 1.8*mar + 32; aprf = 1.8*apr + 32; mayf = 1.8*may + 32; junf = 1.8*jun + 32; julf = 1.8*jul + 32; augf = 1.8*aug + 32; sepf = 1.8*sep + 32; octf = 1.8*oct + 32; novf = 1.8*nov + 32; decf = 1.8*dec + 32; drop jan feb mar apr may jun jul aug sep oct nov dec; RUN; PROC PRINT data = avgfahrenheit; title 'Average Monthly Temperatures in Fahrenheit'; id City; var janf febf marf aprf mayf junf julf augf sepf octf novf decf; RUN; As you can see by the number of assignment statements necessary to make the conversions, the exercise becomes one of patience. Because there are twelve average monthly temperatures, we must write twelve assignment statements. Each assignment statement performs the same calculation. Only the name of the variable changes in each statement. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were properly converted to Fahrenheit temperatures. The above program is crying out for the use of an array. One of the primary arguments for using an array is to reduce the number of statements that are required for processing variables. Let's take a look at an example. ## Example 19.2 The following program uses a one-dimensional array called fahr to convert the average Celsius temperatures in the avgcelsius data set to average Fahrenheit temperatures stored in a new data set called avgfahrenheit: DATA avgfahrenheit; set avgcelsius; array fahr(12) jan feb mar apr may jun jul aug sep oct nov dec; do i = 1 to 12; fahr(i) = 1.8*fahr(i) + 32; end; RUN; PROC PRINT data = avgfahrenheit; title 'Average Monthly Temperatures in Fahrenheit'; id City; var jan feb mar apr may jun jul aug sep oct nov dec; RUN; If you compare this program with the previous program, you can see the statements that replaced the twelve assignment statements. The ARRAY statement defines an array called fahr. It tells SAS that you want to group the twelve month variables, jan , feb, ... dec, into an array called fahr. The (12) that appears in parentheses is a required part of the array declaration. Called the dimension of the array, it tells SAS how many elements, that is, variables, you want to group together. When specifying the variable names to be grouped in the array, we simply list the elements, separating each element with a space. As with all SAS statements, the ARRAY statement is closed with a semicolon (;). Once we've defined the array fahr, we can use it in our code instead of the individual variable names. We refer to the individual elements of the array using its name and an index, such as, fahr(i). The order in which the variables appear in the ARRAY statement determines the variable's position in the array. For example, fahr(1) corresponds to the jan variable, fahr(2) corresponds to the feb variable, and fahr(12) corresponds to the dec variable. It's when you use an array like fahr, in conjunction with an iterative DO loop, that you can really simplify your code, as we did in this program. The DO loop tells SAS to process through the elements of the fahr array, each time converting the Celsius temperature to a Fahrenheit temperature. For example, when the index variable i is 1, the assignment statement becomes: fahr(1) = 1.8*fahr(1) + 32; which you could think of as saying: jan = 1.8*jan + 32; The value of jan on the right side of the equal sign is the Celsius temperature. After the assignment statement is executed, the value of jan on the left side of the equal sign is updated to reflect the Fahrenheit temperature. Now, launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures. Oh, one more thing to point out! Note that the variables listed in the PRINT procedure's VAR statement are the original variable names jan, feb, ..., dec, not the variables as they were grouped into an array, fahr(1), fahr(2), ..., fahr(12). That's because an array exists only for the duration of the DATA step. If in the PRINT procedure, you instead tell SAS to print fahr(1), fahr(2), ... you'll see that SAS will hiccup. Let's summarize! ## Defining an Array You must use an ARRAY statement having the following general form in order to group previously defined data set variables into an array: ARRAY array-name(dimension) <elements>; where: • array-name must be a valid SAS name that specifies the name of the array • dimension describes the number and arrangement of array elements. The default dimension is one. • elements list the variables to be grouped together to form the array. The array elements must be either all numeric or all character. Using standard SAS Help notation, the term elements appears in <> brackets to indicate that they are optional. That is, you do not have to specify elements in the ARRAY statement. If no elements are listed, new variables are created with default names. A few more points must be made about the array-name. Unless you are interested in confusing SAS, you should not give an array the same name as a variable that appears in the same DATA step. You should also avoid giving an array the same name as a valid SAS function. SAS allows you to do so, but then you won't be able to use the function in the same DATA step. For example, if you named an array mean in a DATA step, you would not be able to use the mean function in the DATA step. SAS will print a warning message in your log window to let you know such. Finally, array names cannot be used in LABEL, FORMAT, DROP, KEEP, or LENGTH statements. The three examples that remain in this section pertain to alternative ways of defining the array. The first pertains to an alternative way of defining the dimension of the array. The second and third pertain to alternative ways of defining the variables to be grouped in the array. ## Example 19.3 The following program is identical to the program in the previous example, except the 12 in the ARRAY statement has been changed to an asterisk (*): DATA avgfahrenheittwo; set avgcelsius; array fahr(*) jan feb mar apr may jun jul aug sep oct nov dec; do i = 1 to 12; fahr(i) = 1.8*fahr(i) + 32; end; RUN; PROC PRINT data = avgfahrenheittwo; title 'Average Monthly Temperatures in Fahrenheit'; id City; var jan feb mar apr may jun jul aug sep oct nov dec; RUN; Simple enough! Rather than you having to tell SAS how many variables you are grouping in an array, you can let SAS to the dirty work of counting the number of elements you include in your variable list. To do so, you simply define the dimension using an asterisk (*). You might find this strategy particularly helpful if you are grouping so many variables together into an array that you don't want to spend the time counting them. Incidentally, throughout this lesson, we enclose the array's dimension (or index variable) in parentheses ( ). We could alternatively use braces { } or brackets [ ]. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures. ## Example 19.4 The following program re-reads the average monthly temperatures of the ten cities into numbered variables m1, m2, ..., m12, and then uses a numbered range list m1-m12 as a shortcut in specifying the elements of the fahr array in the ARRAY statement: DATA avgtempsF; input City$ 1-18 m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12;
array fahr(*) m1-m12;
do i = 1 to 12;
fahr(i) = 1.8*fahr(i) + 32;
end;
DATALINES;
State College, PA  -2 -2  2  8 14 19 21 20 16 10  4 -1
Miami, FL          20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO      -1  1  6 13 18 23 26 25 21 15  7  1
New Orleans, LA    11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI        -8 -5  0  7 14 19 22 20 16 10  2 -5
Houston, TX        10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ        12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA         5  6  7 10 13 16 18 18 16 12  8  6
San Francisco, CA  10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA      13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;

PROC PRINT data = avgtempsF;
title 'Average Monthly Temperatures in Fahrenheit';
id City;
var m1-m12;
RUN;

When specifying a numbered range of variables:

• the variables must have the same name except for the last character or characters
• the last character of each variable must be numeric
• the variables must be numbered consecutively

As you can see, the variables m1, m2, ..., m12 in our program meet each of these conditions. That's why we can use the shortcut m1-m12 when we define our array fahr in the ARRAY statement.

Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures.

The above program used a numbered range list to shorten the list of variable names grouped into the fahr array. In some cases, you could also consider using the special name lists _ALL_, _CHARACTER_ and _NUMERIC_:

• Use _ALL_ when you want SAS to use all of the same type of variables (all numeric or all character) in your SAS data set.
• Use _CHARACTER_ when you want SAS to use all of the character variables in your data set.
• Use _NUMERIC_ when you want SAS to use all of the numeric variables in your data set.

The following program illustrates the use of the _NUMERIC_ special list.

## Example 19.5

The following program re-reads the average monthly temperatures of the ten cities into month variables jan, feb, ..., dec, and then uses the special _NUMERIC_ list as a shortcut in specifying the elements of the fahr array in the ARRAY statement:

    DATA avgtempsFtwo;
input City \$ 1-18 jan feb mar apr may jun
jul aug sep oct nov dec;
array fahr(*) _NUMERIC_;
do i = 1 to 12;
fahr(i) = 1.8*fahr(i) + 32;
end;
DATALINES;
State College, PA  -2 -2  2  8 14 19 21 20 16 10  4 -1
Miami, FL          20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO      -1  1  6 13 18 23 26 25 21 15  7  1
New Orleans, LA    11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI        -8 -5  0  7 14 19 22 20 16 10  2 -5
Houston, TX        10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ        12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA         5  6  7 10 13 16 18 18 16 12  8  6
San Francisco, CA  10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA      13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;

PROC PRINT data = avgtempsFtwo;
title 'Average Monthly Temperatures in Fahrenheit';
id City;
var jan--dec;
RUN;

First, note that the only numeric variables in the data set are twelve average monthly temperatures. For that reason, we can — and therefore do — define the array fahr using the special list _NUMERIC_. The remainder of the program is identical in functionality to the previous programs in this section.

Oh, you might want to note one more shortcut that was taken in the PRINT procedure, that is, the name range list, jan--dec, used in the VAR statement. This tells SAS to print all of the variables that appear in the avgtempsFtwo data set between the jan variable and the dec variable — by their position in the data set. This shortcut can also be used when defining an ARRAY. In order to specify a name range list, though, you have to know the internal order, or position, of the variables in the SAS data set. If you are not sure of the internal order of your data set, you can find out using the CONTENTS procedure with the POSITION option.

Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures.

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility