16.3 - Renaming Variables

As is the case when combining data sets by other methods, data set options, such as RENAME=, IN=, DROP=, and KEEP=, can be used when match-merging data sets. In this section, we'll look at an example that uses the RENAME= option to rename variable names that are shared by the data steps to be merged.

Example 16.8 Section

The following program uses the RENAME= option to rename the v_date variables in the demogtwo and statustwo data sets so that when they are merged into a new data set called patientstwo, both visit dates are preserved:

DATA demogtwo;
    input subj gender $ age v_date mmddyy8.;
    format v_date mmddyy8.;
    DATALINES;
    1000 M 42 03/10/96
    1001 M 20 02/19/96
    1002 F 53 02/01/96
    1003 F 40 12/31/95
    1004 M 29 01/10/97;
RUN;
 
DATA statustwo;
    input subj disease $ test $ v_date mmddyy8.;
    format v_date mmddyy8.;
    DATALINES;
    1000 Y Y 03/17/96
    1001 N Y 03/01/96
    1002 N N 02/18/96
    1003 Y Y 01/15/96
    1004 N N 02/01/97;
RUN;
DATA patientstwo;
    merge demogtwo (rename = (v_date = demogdate))
        statustwo (rename = (v_date = statusdate));
    by subj;
RUN;
 
PROC PRINT data=patientstwo NOOBS;
    title 'The patientstwo data set';
RUN;

The patientstwo data set

subj

gender

age

demogdate

disease

test

statusdate

1000

M

42

03/10/96

Y

Y

03/17/96

1001

M

20

02/19/96

N

Y

03/01/96

1002

F

53

02/01/96

N

N

02/18/96

1003

F

40

12/31/95

Y

Y

01/15/96

1004

M

29

01/10/97

N

N

02/01/97

When reviewing the first two DATA steps, in which we tell SAS to read in the demogtwo and statustwo data sets, note that both of the data sets contain a date variable called v_date. The third DATA step tells SAS to merge the demogtwo and statustwo data sets by the subj variable, and when doing so change the name of the v_date variable in the demogtwo data set to demogdate and the name of the v_date variable in the statustwo data set to statusdate. Because of this renaming that takes place, rather than the program data vector looking like this:

_N_

_ERROR_

subj

gender

age

v_date

disease

test

1

0

.

  

.

  

it looks like this:

_N_

_ERROR_

subj

gender

demogdate

disease

test

statusdate

1

0

.

.

.

  

.

Therefore, the merge reduces to a simple match-merge in which all of the values in the input data sets have a rightful position in the program data vector and are therefore preserved.

Launch and run  the program, and review the output to convince yourself that the demogtwo and statustwo data sets are merged by subj successfully and that the values in each input data set are preserved in the output data set patientstwo.