# 19.6 - Using Arrays to Transpose a Data Set

19.6 - Using Arrays to Transpose a Data SetA few lessons back, we learned how to transpose a data set by taking advantage of the *last.* variable and RETAIN and OUTPUT statements. In this section, we'll learn how to use an array to transpose a "tall" data set into a "fat" data set.

## Example 19.15

Throughout this section, we will work with the *tallgrades* data set that is created in the following DATA step:

```
DATA tallgrades;
input idno 1-2 l_name $ 5-9 gtype $ 12-13 grade 15-17;
cards;
10 Smith E1 78
10 Smith E2 82
10 Smith E3 86
10 Smith E4 69
10 Smith P1 97
10 Smith F1 160
11 Simon E1 88
11 Simon E2 72
11 Simon E3 86
11 Simon E4 99
11 Simon P1 100
11 Simon F1 170
12 Jones E1 98
12 Jones E2 92
12 Jones E3 92
12 Jones E4 99
12 Jones P1 99
12 Jones F1 185
;
RUN;
PROC PRINT data = tallgrades NOOBS;
TITLE 'The tall grades data set';
RUN;
```

The *tallgrades* data set contains one observation for each grade for each student. Students are identified by their id number (*idno*) and last name (*l_name*). The data set contains six different types of grades: exam 1 (*E1*), exam 2 (*E2*), exam 3 (*E3*), exam 4 (*E4*), each worth 100 points; one project (*P1*) worth 100 points; and a final exam (*F1*) worth 200 points. Launch and run * * the SAS program so that we can work with the *tallgrades* data set in the next two examples.

## Example 19.16

(You might recall seeing this program a few lessons ago.) Using RETAIN and OUTPUT statements, the following program takes advantage of BY-group processing, as well as RETAIN and OUTPUT statements, to transpose the *tallgrades* data set (one observation per grade) into the *fatgrades* data set (one observation per student):

```
DATA fatgrades;
set tallgrades;
by idno;
if gtype = 'E1' then E1 = grade;
else if gtype = 'E2' then E2 = grade;
else if gtype = 'E3' then E3 = grade;
else if gtype = 'E4' then E4 = grade;
else if gtype = 'P1' then P1 = grade;
else if gtype = 'F1' then F1 = grade;
if last.idno then output;
retain E1 E2 E3 E4 P1 F1;
drop gtype grade;
RUN;
PROC PRINT data=fatgrades;
title 'The fat grades data set';
RUN;
```

First, note that the program takes the grades of each student ("*by idno*") that appear in the variable *grade* and, depending on what type of grade they are ("*if gtype =*"), assigns them to the new variables *E1*, *E2*, ..., *F1*. Only when the last observation is encountered for each student ("*if last.idno*") are the data output to the *fatgrades* data set. The RETAIN statement tells SAS to retain the values of *E1*, *E2*, ..., *F1* from one iteration of the DATA step to the next. If the RETAIN statement was not present, by the time SAS went to output the program data vector, the values of *E1*, *E2*, ..., *F1* would be set to missing.

Now, launch and run * * the SAS program, and review the output from the PRINT procedure to convince yourself that the grades *E1*, *E2*, ..., *F1* are appropriately assigned and retained. Also, note that we have successfully transposed the *tallgrades* data set from a "tall" data set to a "fat" *fatgrades* data set.

## Example 19.17

The following program uses an array to transpose the *tallgrades* data set (one observation per grade) into the *fatgrades* data set (one observation per student):

```
DATA fatgrades;
set tallgrades;
by idno;
array allgrades (6) G1 - G6;
if first.idno then i = 1;
allgrades(i) = grade;
if last.idno then output;
i + 1;
retain G1 - G6;
drop i gtype grade;
RUN;
PROC PRINT data=fatgrades;
title 'The fat grades data set';
RUN;
```

Yikes! This code looks even scarier! Let's dissect it a bit. First, just as in the previous program, the *tallgrades* data set is processed BY *idno*. Doing so, makes the *first.idno* and *last.idno* variables available for us to use. The ARRAY statement defines an array called *allgrades* and, using a numbered range list, associates the array with six (uninitialized) variables *G1*, *G2*, ..., *G6*. The *allgrades* array is used to hold the six grades for each student before they are output in their transposed direction to the *fatgrades* data set. Because the elements of any array, and therefore *allgrades*, must be assigned using an index variable, this is how the transposition takes place:

- ("if first.idno then i = 1;") If the input observation contains a student
*idno*that hasn't yet been encountered in the data set, then the index variable*i*is initialized to 1. If the input observation doesn't contain a new student*idno*, then do nothing other than advance to the next step. - ("allgrades(i) = grade;") The grade from the current observation is assigned to the array
*allgrades*. (For example, if the input observation contains Smith's first grade, then*allgrades*(1) is assigned the value 78. If the input observation contains Smith's second grade, then*allgrades*(2) is assigned the value 82. And so on.) - ("if last.idno then output;") If the input observation is the last observation in the data set that contains the student
*idno*, then dump the program data vector (which contains*allgrades*) to the output data set. (For example, if the input observation is Smith's final exam grade, then output the now fat observation containing his six grades). If the input observation is not the last observation in the data set that contains the student*idno*, do nothing other than advance to the next step. - ("i + 1;") Then, increase the index variable
*i*by 1. (For example, if*i*is 1, change*i*to 2.) - ("retain G1-G6;") Rather than setting
*G1*,*G2*, ...,*G6*to missing at the beginning of the next iteration of the data step, retain their current values. (So, for example, for Smith,*allgrades*(1) would retain its value of 78,*allgrades*(2) would retain its value of 82, and so on.)

The program would keep cycling through the above five steps until it encountered the last observation in the data set. Then, the variables *i*, *gtype*, and *grade* would be dropped from the output *fatgrades* data set.

Phewwww! Now, launch and run * * the SAS program. Review the output from the PRINT procedure to convince yourself that we have now successfully used arrays to transpose the *tallgrades* data set from a "tall" data set to a "fat" *fatgrades* data set.