There may be occasions in which you want to change some of the variable names in your SAS data set. To do so, you'll want to use the RENAME= option. As its name suggests, the RENAME= option allows you to change the variable names within a SAS data set.
The format of the RENAME= option is:
RENAME = (old1=new1 old2=new2 .... oldk=newk);
where old1, old2, ... oldk are the variable names as they appear in the data set that precedes the RENAME= option, and new1, new2, ..., newk are the corresponding new variable names.
The effect of the RENAME= option depends on where it appears:
- If the RENAME= option appears in the SET statement, then the new variable name takes effect when the program data vector is created. Therefore, all programming statements within the DATA step must refer to the new variable name.
- If the RENAME= option appears in the DATA statement, then the new variable name takes effect only when the data are written to the SAS data set. Therefore, all programming statements within the DATA step must refer to the old variable name.
Example 14.12 Section
The following program illustrates the use of the RENAME= option in the SET statement. Specifically, the variable sex is changed to gender, and b_date is changed to birth when the program data vector is created:
DATA back7 (keep = subj gender v_date birth age);
set back3 (rename=(sex=gender b_date=birth));
age = (v_date - birth)/365; *MUST use NEW name for date of birth;
RUN;
PROC PRINT data=back7;
title 'Output Dataset: BACK7';
RUN;
Obs | subj | v_date | birth | gender | age |
---|---|---|---|---|---|
1 | 110051 | 01/25/94 | 12/02/42 | 2 | 51.1836 |
2 | 110052 | 01/27/94 | 01/04/25 | 2 | 69.1096 |
3 | 110053 | 02/22/94 | 03/15/22 | 2 | 71.9918 |
4 | 110055 | 03/15/94 | 03/31/41 | 2 | 52.9918 |
5 | 110057 | 03/15/94 | 07/10/44 | 2 | 49.7123 |
6 | 110058 | 03/18/94 | 09/09/50 | 2 | 43.5507 |
7 | 110059 | 03/18/94 | 07/25/34 | 2 | 59.6877 |
8 | 110060 | 06/14/94 | 05/29/36 | 2 | 58.0822 |
9 | 110062 | 03/31/94 | 04/21/36 | 2 | 57.9808 |
10 | 110065 | 04/04/94 | 10/12/52 | 2 | 41.5041 |
11 | 110066 | 04/12/94 | 08/28/62 | 2 | 31.6438 |
12 | 110067 | 04/26/94 | 02/22/72 | 2 | 22.1890 |
13 | 110068 | 06/13/94 | 09/10/55 | 2 | 38.7836 |
14 | 110069 | 05/31/94 | 08/17/38 | 2 | 55.8247 |
Because the RENAME= option appears in the SET statement, SAS no longer recognizes the variable name sex as the gender, nor b_date as the birth date, of the subject. Instead, SAS recognizes the variable names gender and birth. Hence, when we subsequently calculate the subjects' ages (age) in the DATA step, we must refer to the new variable name birth.
Again, pay particular attention to the syntax of the RENAME= option ... it too can be tricky. The entire RENAME= option must be contained in parentheses immediately following the data set to which you want the name changes to apply. The variable names must also be placed in parentheses. So, in general, the syntax, when applied to a DATA statement, should look like this:
DATA dsname (RENAME = (o1=n1 o2=n2 ...));
where dsname is the data set name and o1 and o2 are the old variable names, and n1 and n2 are the new variable names.
Launch and run the SAS program. Review the output from the PRINT procedure. Convince yourself that the variable names sex and b_date have been changed as advertised to gender and birth, respectively. Also, verify that the ages of the subjects have been calculated appropriately. Then, in the SAS program, change the variable name birth back to the variable name b_date, and re-run the program. Does SAS indeed hiccup?
Example 14.13 Section
The following program illustrates the use of the RENAME= option, when it appears in the DATA statement. Specifically, the variable sex is changed to gender, and b_date is changed to birth when SAS writes the data to the output data set:
DATA back8 (rename=(sex=gender b_date=birth)
keep = subj sex v_date b_date age);
set back3;
age = (v_date - b_date)/365; *MUST use OLD name for date of birth;
RUN;
PROC PRINT data=back8;
title 'Output Dataset: BACK8';
RUN;
Obs | subj | v_date | birth | gender | age |
---|---|---|---|---|---|
1 | 110051 | 01/25/94 | 12/02/42 | 2 | 51.1836 |
2 | 110052 | 01/27/94 | 01/04/25 | 2 | 69.1096 |
3 | 110053 | 02/22/94 | 03/15/22 | 2 | 71.9918 |
4 | 110055 | 03/15/94 | 03/31/41 | 2 | 52.9918 |
5 | 110057 | 03/15/94 | 07/10/44 | 2 | 49.7123 |
6 | 110058 | 03/18/94 | 09/09/50 | 2 | 43.5507 |
7 | 110059 | 03/18/94 | 07/25/34 | 2 | 59.6877 |
8 | 110060 | 06/14/94 | 05/29/36 | 2 | 58.0822 |
9 | 110062 | 03/31/94 | 04/21/36 | 2 | 57.9808 |
10 | 110065 | 04/04/94 | 10/12/52 | 2 | 41.5041 |
11 | 110066 | 04/12/94 | 08/28/62 | 2 | 31.6438 |
12 | 110067 | 04/26/94 | 02/22/72 | 2 | 22.1890 |
13 | 110068 | 06/13/94 | 09/10/55 | 2 | 38.7836 |
14 | 110069 | 05/31/94 | 08/17/38 | 2 | 55.8247 |
Because the RENAME= option appears in the DATA statement, SAS only recognizes the variable names as they appear in the input data set back3. That is, for example, SAS recognizes the variable name b_date as the birth date of the subjects. Hence, when we subsequently calculate the subjects' ages in the DATA step, we must refer to the old variable name b_date. Also, note that the KEEP= option in the DATA statement must refer to the original variable names as they appear in the back3 data set.
This program also illustrates how to use more than one DATA step option at a time. Specifically, the RENAME= and KEEP= options are used to modify the back8 data set. As such, both options are placed within one set of parentheses immediately following the data set to which you want the changes to apply. Then, within those parentheses, the basic syntax for each option is followed.
Launch and run the SAS program, and review the output from the PRINT procedure. Convince yourself that the variable names sex and b_date have been changed as advertised to gender and birth, respectively. Also, verify that the ages of the subjects have been calculated appropriately.