15.5 - Data Step Options

All of the DATA step options — FIRSTOBS=, OBS=, RENAME=, WHERE=, DROP=, KEEP=, and IN= — that we learned about in the previous lesson can be used at the same time that you are one-to-one reading, one-to-one merging, concatenating, or interleaving data sets. If you attach a DATA step option to a SET or MERGE statement, SAS takes action when observations are being read from the input data sets. And, if you attach a DATA step option to a DATA statement, SAS takes action when observations are being written to the output data set. We'll take a look at just one example that involves merging two data sets that contain identical variable names.

Example 15.13 Section

The following program attempts to one-to-one merge two data sets — firstnames and lastnames. The firstnames data set contains a variable called name that contains the first names of five individuals, and the lastnames data set also contains a variable called name that contains the last names of four individuals:

DATA firstnames;
    input subj 5-9 name $ 10-16 gender 19
        height 21-22 weight 24-26;
    CARDS;
    1024 Alice    1 65 125
    1167 Maryann  1 68 140
    1168 Thomas   2 68 190
    1201 Benny    2 68 190
    1302 Felicia  1 63 115
;
RUN;
 
DATA lastnames;
    input name $ 4-9 sysbp 11-13 diasbp 14-15;
    CARDS;
   Smith  120 80
   White  130 90
   Jones  125 72
   Arnold 135 95
;
RUN;
 
DATA alldata;
    merge firstnames lastnames;
RUN;
 
PROC PRINT data=alldata NOOBS;
    title 'The alldata data set';
RUN;

The alldata data set

subj

name

gender

height

weight

sysbp

diasbp

1024

Smith

1

65

125

120

8

1167

White

1

68

140

130

9

1168

Jones

2

68

190

125

7

1201

Arnold

2

68

190

135

9

1302

Felicia

1

63

115

.

.

The first two DATA steps, of course, just tell SAS to read the data values into the firstnames and lastnames data sets. The third contains a MERGE statement that tells SAS to one-to-one merge the observations in the firstnames and lastnames data sets. Launch and run  the SAS program and review the output from the PRINT procedure to see the results of the one-to-one merge. As you can see, because the variable name exists in both data sets, the value from the data set appearing last in the MERGE statement, i.e., lastnames, is the value that goes into the new data set. That is, wherever an observation exists in lastnames, it effectively writes over the value of name from firstnames.

In order to merge the firstnames and lastnames data sets correctly, we must first change the name of the variable name to something different in one of the two data sets. We'll change the name in both data sets. The following program illustrates changing the variable name to f_name in the firstnames data set and to l_name in the lastnames data set, while simultaneously merging firstnames and lastnames in a one-to-one manner:

DATA alldata2;
   merge firstnames (rename = (name=f_name))
        	  lastnames  (rename = (name=l_name));
RUN;
PROC PRINT data=alldata2 NOOBS;
   title 'The alldata2 data set';
RUN;

The alldata2 data set

subj

f_name

gender

height

weight

l_name

sysbp

diasbp

1024

Alice

1

65

125

Smith

120

8

1167

Maryann

1

68

140

White

130

9

1168

Thomas

2

68

190

Jones

125

7

1201

Benny

2

68

190

Arnold

135

9

1302

Felicia

1

63

115

 

.

.

Because the first RENAME= option is attached to the firstnames data set on the MERGE statement, it tells SAS to change the name variable to f_name when SAS reads in observations from the firstnames data set. And, because the second RENAME= option is attached to the lastnames data set on the MERGE statement, it tells SAS to change the name variable to l_name when SAS reads in observations from the lastnames data set. At the same time that SAS is executing the name changes, it performs the one-to-one merge of the firstnames and lastnames data sets.

Launch and run  the SAS program and review the output from the PRINT procedure to see the results of the one-to-one merge. As you can see, we have now successfully combined the two data sets so that all of their original information is preserved.