19.1 - One-Dimensional Arrays

A SAS array is a temporary grouping of SAS variables under a single name. For example, suppose you have four variables named winter, spring, summer, and, fall. Rather than referring to the variables by their four different names, you could associate the variables with an array name, say seasons, and refer to the variables as seasons(1), seasons(2), seasons(3), and seasons(4). When you pair an array up with an iterative DO loop, you create a powerful and efficient way of writing your computer programs. Let's take a look at an example!

Example 19.1 Section

The following program simply reads in the average montly temperatures (in Celsius) for ten different cities in the United States into a temporary SAS data set called avgcelsius:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;
DATA avgcelsius;
    input City $ 1-18 jan feb mar apr may jun
                        jul aug sep oct nov dec;
    DATALINES;
State College, PA  -2 -2  2  8 14 19 21 20 16 10  4 -1
Miami, FL          20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO      -1  1  6 13 18 23 26 25 21 15  7  1
New Orleans, LA    11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI        -8 -5  0  7 14 19 22 20 16 10  2 -5
Houston, TX        10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ        12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA         5  6  7 10 13 16 18 18 16 12  8  6
San Francisco, CA  10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA      13 14 15 16 17 19 21 22 21 19 16 14
;
RUN;
PROC PRINT data = avgcelsius;
    title 'Average Monthly Temperatures in Celsius';
    id City;
    var jan feb mar apr may jun 
        jul aug sep oct nov dec;
RUN;

Average Monthly Temperatures in Celsius

City

jan

feb

mar

apr

may

jun

jul

aug

sep

oct

nov

dec

State College, PA

-2

-2

2

8

14

19

21

20

16

10

4

-1

Miami, FL

20

20

22

23

26

27

28

28

27

26

23

20

St. Louis, MO

-1

1

6

13

18

23

26

25

21

15

7

1

New Orleans, LA

11

13

16

20

23

27

27

27

26

21

16

12

Madison, WI

-8

-5

0

7

14

19

22

20

16

10

2

-5

Houston, TX

10

12

16

20

23

27

28

28

26

21

16

12

Phoenix, AZ

12

14

16

21

26

31

33

32

30

23

16

12

Seattle, WA

5

6

7

10

13

16

18

18

16

12

8

6

San Francisco, CA

10

12

12

13

14

15

15

16

17

16

14

11

San Diego, CA

13

14

15

16

17

19

21

22

21

19

16

14

Launch and run  the SAS program so that the data set becomes available to you. Also, review the output from the PRINT procedure to convince yourself that the data were read in properly.

Now, suppose that we don't feel particularly comfortable with understanding Celsius temperatures, and therefore, we want to convert the Celsius temperatures into Fahrenheit temperatures for which we have a better feel. The following SAS program uses the standard conversion formula:

Fahrenheit temperature = 1.8*Celsius temperature + 32

to convert the Celsius temperatures in the avgcelsius data set to Fahrenheit temperatures stored in a new data set called avgfahrenheit:

DATA avgfahrenheit;
    set avgcelsius;
    janf = 1.8*jan + 32;
    febf = 1.8*feb + 32;
    marf = 1.8*mar + 32;
    aprf = 1.8*apr + 32;
    mayf = 1.8*may + 32;
    junf = 1.8*jun + 32;
    julf = 1.8*jul + 32;
    augf = 1.8*aug + 32;
    sepf = 1.8*sep + 32;
    octf = 1.8*oct + 32;
    novf = 1.8*nov + 32;
    decf = 1.8*dec + 32;
    drop jan feb mar apr may jun
            jul aug sep oct nov dec;
RUN;
PROC PRINT data = avgfahrenheit;
    title 'Average Monthly Temperatures in Fahrenheit';
    id City;
    var janf febf marf aprf mayf junf 
        julf augf sepf octf novf decf;
RUN;

Average Monthly Temperatures in Fahrenheit

City

janf

febf

marf

aprf

mayf

junf

julf

augf

sepf

octf

novf

decf

State College, PA

28.4

28.4

35.6

46.4

57.2

66.2

69.8

68.0

60.8

50.0

39.2

30.2

Miami, FL

68.0

68.0

71.6

73.4

78.8

80.6

82.4

82.4

80.6

78.8

73.4

68.0

St. Louis, MO

30.2

33.8

42.8

55.4

64.4

73.4

78.8

77.0

69.8

59.0

44.6

33.8

New Orleans, LA

51.8

55.4

60.8

68.0

73.4

80.6

80.6

80.6

78.8

69.8

60.8

53.6

Madison, WI

17.6

23.0

32.0

44.6

57.2

66.2

71.6

68.0

60.8

50.0

35.6

23.0

Houston, TX

50.0

53.6

60.8

68.0

73.4

80.6

82.4

82.4

78.8

69.8

60.8

53.6

Phoenix, AZ

53.6

57.2

60.8

69.8

78.8

87.8

91.4

89.6

86.0

73.4

60.8

53.6

Seattle, WA

41.0

42.8

44.6

50.0

55.4

60.8

64.4

64.4

60.8

53.6

46.4

42.8

San Francisco, CA

50.0

53.6

53.6

55.4

57.2

59.0

59.0

60.8

62.6

60.8

57.2

51.8

San Diego, CA

55.4

57.2

59.0

60.8

62.6

66.2

69.8

71.6

69.8

66.2

60.8

57.2

As you can see by the number of assignment statements necessary to make the conversions, the exercise becomes one of patience. Because there are twelve average monthly temperatures, we must write twelve assignment statements. Each assignment statement performs the same calculation. Only the name of the variable changes in each statement. Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were properly converted to Fahrenheit temperatures.

The above program is crying out for the use of an array. One of the primary arguments for using an array is to reduce the number of statements that are required for processing variables. Let's take a look at an example.

Example 19.2 Section

The following program uses a one-dimensional array called fahr to convert the average Celsius temperatures in the avgcelsius data set to average Fahrenheit temperatures stored in a new data set called avgfahrenheit:

DATA avgfahrenheit;
    set avgcelsius;
    array fahr(12) jan feb mar apr may jun
                   jul aug sep oct nov dec;
    do i = 1 to 12;
            fahr(i) = 1.8*fahr(i) + 32;
    end;
RUN;
PROC PRINT data = avgfahrenheit;
    title 'Average Monthly Temperatures in Fahrenheit';
    id City;
    var jan feb mar apr may jun 
        jul aug sep oct nov dec;
RUN;

Average Monthly Temperatures in Fahrenheit

City

jan

feb

mar

apr

may

jun

jul

aug

sep

oct

nov

dec

State College, PA

28.4

28.4

35.6

46.4

57.2

66.2

69.8

68.0

60.8

50.0

39.2

30.2

Miami, FL

68.0

68.0

71.6

73.4

78.8

80.6

82.4

82.4

80.6

78.8

73.4

68.0

St. Louis, MO

30.2

33.8

42.8

55.4

64.4

73.4

78.8

77.0

69.8

59.0

44.6

33.8

New Orleans, LA

51.8

55.4

60.8

68.0

73.4

80.6

80.6

80.6

78.8

69.8

60.8

53.6

Madison, WI

17.6

23.0

32.0

44.6

57.2

66.2

71.6

68.0

60.8

50.0

35.6

23.0

Houston, TX

50.0

53.6

60.8

68.0

73.4

80.6

82.4

82.4

78.8

69.8

60.8

53.6

Phoenix, AZ

53.6

57.2

60.8

69.8

78.8

87.8

91.4

89.6

86.0

73.4

60.8

53.6

Seattle, WA

41.0

42.8

44.6

50.0

55.4

60.8

64.4

64.4

60.8

53.6

46.4

42.8

San Francisco, CA

50.0

53.6

53.6

55.4

57.2

59.0

59.0

60.8

62.6

60.8

57.2

51.8

San Diego, CA

55.4

57.2

59.0

60.8

62.6

66.2

69.8

71.6

69.8

66.2

60.8

57.2

If you compare this program with the previous program, you can see the statements that replaced the twelve assignment statements. The ARRAY statement defines an array called fahr. It tells SAS that you want to group the twelve month variables, jan , feb, ... dec, into an array called fahr. The (12) that appears in parentheses is a required part of the array declaration. Called the dimension of the array, it tells SAS how many elements, that is, variables, you want to group together. When specifying the variable names to be grouped in the array, we simply list the elements, separating each element with a space. As with all SAS statements, the ARRAY statement is closed with a semicolon (;).

Once we've defined the array fahr, we can use it in our code instead of the individual variable names. We refer to the individual elements of the array using its name and an index, such as, fahr(i). The order in which the variables appear in the ARRAY statement determines the variable's position in the array. For example, fahr(1) corresponds to the jan variable, fahr(2) corresponds to the feb variable, and fahr(12) corresponds to the dec variable. It's when you use an array like fahr, in conjunction with an iterative DO loop, that you can really simplify your code, as we did in this program.

The DO loop tells SAS to process through the elements of the fahr array, each time converting the Celsius temperature to a Fahrenheit temperature. For example, when the index variable i is 1, the assignment statement becomes:

fahr(1) = 1.8*fahr(1) + 32;

which you could think of as saying:

jan = 1.8*jan + 32;

The value of jan on the right side of the equal sign is the Celsius temperature. After the assignment statement is executed, the value of jan on the left side of the equal sign is updated to reflect the Fahrenheit temperature.

Now, launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures. Oh, one more thing to point out! Note that the variables listed in the PRINT procedure's VAR statement are the original variable names jan, feb, ..., dec, not the variables as they were grouped into an array, fahr(1), fahr(2), ..., fahr(12). That's because an array exists only for the duration of the DATA step. If in the PRINT procedure, you instead tell SAS to print fahr(1), fahr(2), ... you'll see that SAS will hiccup. Let's summarize!

Defining an Array Section

You must use an ARRAY statement having the following general form in order to group previously defined data set variables into an array:

ARRAY array-name(dimension) <elements>;

where:

  • array-name must be a valid SAS name that specifies the name of the array
  • dimension describes the number and arrangement of array elements. The default dimension is one.
  • elements list the variables to be grouped together to form the array. The array elements must be either all numeric or all characters. Using standard SAS Help notation, the term elements appear in <> brackets to indicate that they are optional. That is, you do not have to specify elements in the ARRAY statement. If no elements are listed, new variables are created with default names.

A few more points must be made about the array-name. Unless you are interested in confusing SAS, you should not give an array the same name as a variable that appears in the same DATA step. You should also avoid giving an array the same name as a valid SAS function. SAS allows you to do so, but then you won't be able to use the function in the same DATA step. For example, if you named an array mean in a DATA step, you would not be able to use the mean function in the DATA step. SAS will print a warning message in your log window to let you know such. Finally, array names cannot be used in LABEL, FORMAT, DROP, KEEP, or LENGTH statements.

The three examples that remain in this section pertain to alternative ways of defining the array. The first pertains to an alternative way of defining the dimension of the array. The second and third pertain to alternative ways of defining the variables to be grouped in the array.

Example 19.3 Section

The following program is identical to the program in the previous example, except the 12 in the ARRAY statement has been changed to an asterisk (*):

DATA avgfahrenheittwo;
    set avgcelsius;
    array fahr(*) jan feb mar apr may jun
                  jul aug sep oct nov dec;
    do i = 1 to 12;
            fahr(i) = 1.8*fahr(i) + 32;
    end;
RUN;

PROC PRINT data = avgfahrenheittwo;
    title 'Average Monthly Temperatures in Fahrenheit';
    id City;
    var jan feb mar apr may jun 
        jul aug sep oct nov dec;
RUN;

Average Monthly Temperatures in Fahrenheit

City

jan

feb

mar

apr

may

jun

jul

aug

sep

oct

nov

dec

State College, PA

28.4

28.4

35.6

46.4

57.2

66.2

69.8

68.0

60.8

50.0

39.2

30.2

Miami, FL

68.0

68.0

71.6

73.4

78.8

80.6

82.4

82.4

80.6

78.8

73.4

68.0

St. Louis, MO

30.2

33.8

42.8

55.4

64.4

73.4

78.8

77.0

69.8

59.0

44.6

33.8

New Orleans, LA

51.8

55.4

60.8

68.0

73.4

80.6

80.6

80.6

78.8

69.8

60.8

53.6

Madison, WI

17.6

23.0

32.0

44.6

57.2

66.2

71.6

68.0

60.8

50.0

35.6

23.0

Houston, TX

50.0

53.6

60.8

68.0

73.4

80.6

82.4

82.4

78.8

69.8

60.8

53.6

Phoenix, AZ

53.6

57.2

60.8

69.8

78.8

87.8

91.4

89.6

86.0

73.4

60.8

53.6

Seattle, WA

41.0

42.8

44.6

50.0

55.4

60.8

64.4

64.4

60.8

53.6

46.4

42.8

San Francisco, CA

50.0

53.6

53.6

55.4

57.2

59.0

59.0

60.8

62.6

60.8

57.2

51.8

San Diego, CA

55.4

57.2

59.0

60.8

62.6

66.2

69.8

71.6

69.8

66.2

60.8

57.2

Simple enough! Rather than having to tell SAS how many variables you are grouping in an array, you can let SAS do the dirty work of counting the number of elements you include in your variable list. To do so, you simply define the dimension using an asterisk (*). You might find this strategy particularly helpful if you are grouping so many variables together into an array that you don't want to spend the time counting them. Incidentally, throughout this lesson, we enclose the array's dimension (or index variable) in parentheses ( ). We could alternatively use braces { } or brackets [ ].

Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures.

Example 19.4 Section

The following program re-reads the average monthly temperatures of the ten cities into numbered variables m1, m2, ..., m12, and then uses a numbered range list m1-m12 as a shortcut in specifying the elements of the fahr array in the ARRAY statement:

DATA avgfahrenheittwo;
DATA avgtempsF;
    input City $ 1-18 m1 m2 m3 m4 m5 m6 m7 m8 m9 m10 m11 m12;
    array fahr(*) m1-m12;
    do i = 1 to 12;
            fahr(i) = 1.8*fahr(i) + 32;
    end;
    DATALINES;
State College, PA  -2 -2  2  8 14 19 21 20 16 10  4 -1
Miami, FL          20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO      -1  1  6 13 18 23 26 25 21 15  7  1
New Orleans, LA    11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI        -8 -5  0  7 14 19 22 20 16 10  2 -5
Houston, TX        10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ        12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA         5  6  7 10 13 16 18 18 16 12  8  6
San Francisco, CA  10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA      13 14 15 16 17 19 21 22 21 19 16 14;
RUN;
PROC PRINT data = avgtempsF;
    title 'Average Monthly Temperatures in Fahrenheit';
    id City;
    var m1-m12;
RUN;

Average Monthly Temperatures in Fahrenheit

City

m1

m2

m3

m4

m5

m6

m7

m8

m9

m10

m11

m12

State College, PA

28.4

28.4

35.6

46.4

57.2

66.2

69.8

68.0

60.8

50.0

39.2

30.2

Miami, FL

68.0

68.0

71.6

73.4

78.8

80.6

82.4

82.4

80.6

78.8

73.4

68.0

St. Louis, MO

30.2

33.8

42.8

55.4

64.4

73.4

78.8

77.0

69.8

59.0

44.6

33.8

New Orleans, LA

51.8

55.4

60.8

68.0

73.4

80.6

80.6

80.6

78.8

69.8

60.8

53.6

Madison, WI

17.6

23.0

32.0

44.6

57.2

66.2

71.6

68.0

60.8

50.0

35.6

23.0

Houston, TX

50.0

53.6

60.8

68.0

73.4

80.6

82.4

82.4

78.8

69.8

60.8

53.6

Phoenix, AZ

53.6

57.2

60.8

69.8

78.8

87.8

91.4

89.6

86.0

73.4

60.8

53.6

Seattle, WA

41.0

42.8

44.6

50.0

55.4

60.8

64.4

64.4

60.8

53.6

46.4

42.8

San Francisco, CA

50.0

53.6

53.6

55.4

57.2

59.0

59.0

60.8

62.6

60.8

57.2

51.8

San Diego, CA

55.4

57.2

59.0

60.8

62.6

66.2

69.8

71.6

69.8

66.2

60.8

57.2

When specifying a numbered range of variables:

  • the variables must have the same name except for the last character or characters
  • the last character of each variable must be numeric
  • the variables must be numbered consecutively

As you can see, the variables m1, m2, ..., m12 in our program meet each of these conditions. That's why we can use the shortcut m1-m12 when we define our array fahr in the ARRAY statement.

Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures.

The above program used a numbered range list to shorten the list of variable names grouped into the fahr array. In some cases, you could also consider using the special name lists _ALL_, _CHARACTER_ and _NUMERIC_:

  • Use _ALL_ when you want SAS to use all of the same types of variables (all numeric or all characters) in your SAS data set.
  • Use _CHARACTER_ when you want SAS to use all of the character variables in your data set.
  • Use _NUMERIC_ when you want SAS to use all of the numeric variables in your data set.

The following program illustrates the use of the _NUMERIC_ special list.

Example 19.5 Section

The following program re-reads the average monthly temperatures of the ten cities into month variables jan, feb, ..., dec, and then uses the special _NUMERIC_ list as a shortcut in specifying the elements of the fahr array in the ARRAY statement:

DATA avgtempsFtwo;
    input City $ 1-18 jan feb mar apr may jun 
                      jul aug sep oct nov dec;
	array fahr(*) _NUMERIC_;
	do i = 1 to 12;
	      fahr(i) = 1.8*fahr(i) + 32;
    end;
    DATALINES;
State College, PA  -2 -2  2  8 14 19 21 20 16 10  4 -1
Miami, FL          20 20 22 23 26 27 28 28 27 26 23 20
St. Louis, MO      -1  1  6 13 18 23 26 25 21 15  7  1
New Orleans, LA    11 13 16 20 23 27 27 27 26 21 16 12
Madison, WI        -8 -5  0  7 14 19 22 20 16 10  2 -5
Houston, TX        10 12 16 20 23 27 28 28 26 21 16 12
Phoenix, AZ        12 14 16 21 26 31 33 32 30 23 16 12
Seattle, WA         5  6  7 10 13 16 18 18 16 12  8  6
San Francisco, CA  10 12 12 13 14 15 15 16 17 16 14 11
San Diego, CA      13 14 15 16 17 19 21 22 21 19 16 14;
RUN;
PROC PRINT data = avgtempsFtwo;
    title 'Average Monthly Temperatures in Fahrenheit';
	id City;
	var jan--dec;
RUN;

Average Monthly Temperatures in Fahrenheit

City

jan

feb

mar

apr

may

jun

jul

aug

sep

oct

nov

dec

State College, PA

28.4

28.4

35.6

46.4

57.2

66.2

69.8

68.0

60.8

50.0

39.2

30.2

Miami, FL

68.0

68.0

71.6

73.4

78.8

80.6

82.4

82.4

80.6

78.8

73.4

68.0

St. Louis, MO

30.2

33.8

42.8

55.4

64.4

73.4

78.8

77.0

69.8

59.0

44.6

33.8

New Orleans, LA

51.8

55.4

60.8

68.0

73.4

80.6

80.6

80.6

78.8

69.8

60.8

53.6

Madison, WI

17.6

23.0

32.0

44.6

57.2

66.2

71.6

68.0

60.8

50.0

35.6

23.0

Houston, TX

50.0

53.6

60.8

68.0

73.4

80.6

82.4

82.4

78.8

69.8

60.8

53.6

Phoenix, AZ

53.6

57.2

60.8

69.8

78.8

87.8

91.4

89.6

86.0

73.4

60.8

53.6

Seattle, WA

41.0

42.8

44.6

50.0

55.4

60.8

64.4

64.4

60.8

53.6

46.4

42.8

San Francisco, CA

50.0

53.6

53.6

55.4

57.2

59.0

59.0

60.8

62.6

60.8

57.2

51.8

San Diego, CA

55.4

57.2

59.0

60.8

62.6

66.2

69.8

71.6

69.8

66.2

60.8

57.2

First, note that the only numeric variables in the data set are twelve average monthly temperatures. For that reason, we can — and therefore do — define the array fahr using the special list _NUMERIC_. The remainder of the program is identical in functionality to the previous programs in this section.

Oh, you might want to note one more shortcut that was taken in the PRINT procedure, that is, the name range list, jan--dec, used in the VAR statement. This tells SAS to print all of the variables that appear in the avgtempsFtwo data set between the jan variable and the dec variable — by their position in the data set. This shortcut can also be used when defining an ARRAY. In order to specify a name range list, though, you have to know the internal order, or position, of the variables in the SAS data set. If you are not sure of the internal order of your data set, you can find out using the CONTENTS procedure with the POSITION option.

Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that the Celsius temperatures were again properly converted to Fahrenheit temperatures.