15.5 - Data Step Options

All of the DATA step options — FIRSTOBS=, OBS=, RENAME=, WHERE=, DROP=, KEEP=, and IN= — that we learned about in the previous lesson can be used at the same time that you are one-to-one reading, one-to-one merging, concatenating, or interleaving data sets. If you attach a DATA step option to a SET or MERGE statement, SAS takes action when observations are being read from the input data sets. And, if you attach a DATA step option to a DATA statement, SAS takes action when observations are being written to the output data set. We'll take a look at just one example that involves merging two data sets that contain identical variable names.

Example 15.13 Section

The following program attempts to one-to-one merge two data sets — firstnames and lastnames. The firstnames data set contains a variable called name that contains the first names of five individuals, and the lastnames data set also contains a variable called name that contains the last names of four individuals:

DATA firstnames;
    input subj 5-9 name $ 10-16 gender 19
        height 21-22 weight 24-26;
    CARDS;
    1024 Alice    1 65 125
    1167 Maryann  1 68 140
    1168 Thomas   2 68 190
    1201 Benny    2 68 190
    1302 Felicia  1 63 115
;
RUN;
 
DATA lastnames;
    input name $ 4-9 sysbp 11-13 diasbp 14-15;
    CARDS;
   Smith  120 80
   White  130 90
   Jones  125 72
   Arnold 135 95
;
RUN;
 
DATA alldata;
    merge firstnames lastnames;
RUN;
 
PROC PRINT data=alldata NOOBS;
    title 'The alldata data set';
RUN;

The alldata data set
subj	name	gender	height	weight	sysbp	diasbp
1024	Smith	1	65	125	120	8
1167	White	1	68	140	130	9
1168	Jones	2	68	190	125	7
1201	Arnold	2	68	190	135	9
1302	Felicia	1	63	115	.	.

The first two DATA steps, of course, just tell SAS to read the data values into the firstnames and lastnames data sets. The third contains a MERGE statement that tells SAS to one-to-one merge the observations in the firstnames and lastnames data sets. Launch and run the SAS program and review the output from the PRINT procedure to see the results of the one-to-one merge. As you can see, because the variable name exists in both data sets, the value from the data set appearing last in the MERGE statement, i.e., lastnames, is the value that goes into the new data set. That is, wherever an observation exists in lastnames, it effectively writes over the value of name from firstnames.

In order to merge the firstnames and lastnames data sets correctly, we must first change the name of the variable name to something different in one of the two data sets. We'll change the name in both data sets. The following program illustrates changing the variable name to f_name in the firstnames data set and to l_name in the lastnames data set, while simultaneously merging firstnames and lastnames in a one-to-one manner:

DATA alldata2;
   merge firstnames (rename = (name=f_name))
        	  lastnames  (rename = (name=l_name));
RUN;
PROC PRINT data=alldata2 NOOBS;
   title 'The alldata2 data set';
RUN;

The alldata2 data set
subj	f_name	gender	height	weight	l_name	sysbp	diasbp
1024	Alice	1	65	125	Smith	120	8
1167	Maryann	1	68	140	White	130	9
1168	Thomas	2	68	190	Jones	125	7
1201	Benny	2	68	190	Arnold	135	9
1302	Felicia	1	63	115		.	.

Because the first RENAME= option is attached to the firstnames data set on the MERGE statement, it tells SAS to change the name variable to f_name when SAS reads in observations from the firstnames data set. And, because the second RENAME= option is attached to the lastnames data set on the MERGE statement, it tells SAS to change the name variable to l_name when SAS reads in observations from the lastnames data set. At the same time that SAS is executing the name changes, it performs the one-to-one merge of the firstnames and lastnames data sets.

Launch and run the SAS program and review the output from the PRINT procedure to see the results of the one-to-one merge. As you can see, we have now successfully combined the two data sets so that all of their original information is preserved.