All of the DATA step options — FIRSTOBS=, OBS=, RENAME=, WHERE=, DROP=, KEEP=, and IN= — that we learned about in the previous lesson can be used at the same time that you are one-to-one reading, one-to-one merging, concatenating, or interleaving data sets. If you attach a DATA step option to a SET or MERGE statement, SAS takes action when observations are being read from the input data sets. And, if you attach a DATA step option to a DATA statement, SAS takes action when observations are being written to the output data set. We'll take a look at just one example that involves merging two data sets that contain identical variable names.
Example 15.13 Section
The following program attempts to one-to-one merge two data sets — firstnames and lastnames. The firstnames data set contains a variable called name that contains the first names of five individuals, and the lastnames data set also contains a variable called name that contains the last names of four individuals:
DATA firstnames;
input subj 5-9 name $ 10-16 gender 19
height 21-22 weight 24-26;
CARDS;
1024 Alice 1 65 125
1167 Maryann 1 68 140
1168 Thomas 2 68 190
1201 Benny 2 68 190
1302 Felicia 1 63 115
;
RUN;
DATA lastnames;
input name $ 4-9 sysbp 11-13 diasbp 14-15;
CARDS;
Smith 120 80
White 130 90
Jones 125 72
Arnold 135 95
;
RUN;
DATA alldata;
merge firstnames lastnames;
RUN;
PROC PRINT data=alldata NOOBS;
title 'The alldata data set';
RUN;
subj | name | gender | height | weight | sysbp | diasbp |
---|---|---|---|---|---|---|
1024 | Smith | 1 | 65 | 125 | 120 | 8 |
1167 | White | 1 | 68 | 140 | 130 | 9 |
1168 | Jones | 2 | 68 | 190 | 125 | 7 |
1201 | Arnold | 2 | 68 | 190 | 135 | 9 |
1302 | Felicia | 1 | 63 | 115 | . | . |
The first two DATA steps, of course, just tell SAS to read the data values into the firstnames and lastnames data sets. The third contains a MERGE statement that tells SAS to one-to-one merge the observations in the firstnames and lastnames data sets. Launch and run the SAS program and review the output from the PRINT procedure to see the results of the one-to-one merge. As you can see, because the variable name exists in both data sets, the value from the data set appearing last in the MERGE statement, i.e., lastnames, is the value that goes into the new data set. That is, wherever an observation exists in lastnames, it effectively writes over the value of name from firstnames.
In order to merge the firstnames and lastnames data sets correctly, we must first change the name of the variable name to something different in one of the two data sets. We'll change the name in both data sets. The following program illustrates changing the variable name to f_name in the firstnames data set and to l_name in the lastnames data set, while simultaneously merging firstnames and lastnames in a one-to-one manner:
DATA alldata2;
merge firstnames (rename = (name=f_name))
lastnames (rename = (name=l_name));
RUN;
PROC PRINT data=alldata2 NOOBS;
title 'The alldata2 data set';
RUN;
subj | f_name | gender | height | weight | l_name | sysbp | diasbp |
---|---|---|---|---|---|---|---|
1024 | Alice | 1 | 65 | 125 | Smith | 120 | 8 |
1167 | Maryann | 1 | 68 | 140 | White | 130 | 9 |
1168 | Thomas | 2 | 68 | 190 | Jones | 125 | 7 |
1201 | Benny | 2 | 68 | 190 | Arnold | 135 | 9 |
1302 | Felicia | 1 | 63 | 115 | . | . |
Because the first RENAME= option is attached to the firstnames data set on the MERGE statement, it tells SAS to change the name variable to f_name when SAS reads in observations from the firstnames data set. And, because the second RENAME= option is attached to the lastnames data set on the MERGE statement, it tells SAS to change the name variable to l_name when SAS reads in observations from the lastnames data set. At the same time that SAS is executing the name changes, it performs the one-to-one merge of the firstnames and lastnames data sets.
Launch and run the SAS program and review the output from the PRINT procedure to see the results of the one-to-one merge. As you can see, we have now successfully combined the two data sets so that all of their original information is preserved.