The INVALUE statement in the FORMAT procedure allows you to create your own customized informats so that variables can be read in meaningful ways, whereas the VALUE statement allows you to create your own customized formats so that variables can be displayed in meaningful ways. Customized formats do not alter variable types; they merely tell SAS to print variables according to your customized definitions. For example, providing the numeric variable sex is associated with the format sexfmt that is defined in the following VALUE statement:
VALUE sexfmt 1 = 'Male'
2 = 'Female';
SAS will print "Male" when the variable sex = 1 and "Female" when sex = 2. The variable type of sex remains numeric. Restrictions on the VALUE statement include:
- The name of the format for numeric variables (for example, sexfmt) must be a valid SAS name of up to 32 characters, not ending in a number.
- The name of the format for a character variable must begin with a $ sign, and have no more than 31 additional characters.
- When you define the format in the VALUE statement, the format name cannot end in a period.
- But when you use the format later, you must follow the name with a period.
- The maximum length for a format label is 32,767 characters
Just as is true for the INVALUE statement, the VALUE statement in the FORMAT procedure merely defines a format. In order for the format to take effect, you must associate the variable with the format you've defined by using a FORMAT statement in either a DATA step or a PROC step.
Example 9.5 Section
The following FORMAT procedure defines how SAS should display numeric variables associated with the two formats sexfmt and racefmt during output:
PROC FORMAT;
value sexfmt 1 = 'Male'
2 = 'Female';
value racefmt 1 = 'Indian'
2 = 'Asian'
3 = 'Black'
4 = 'White';
RUN;
The translation is restricted to taking place on output since the VALUE statement is used. As a result of this code, providing the numeric variable sex is later associated with the format sexfmt, whenever SAS goes to print the numeric value 1 for the variable sex, it will instead print the character value 'Male'. Similarly, whenever SAS goes to print the numeric value 2 for the variable sex, it will instead print the character value 'Female'.
Launch and run the SAS program. Again, the only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like this:
1 PROC FORMAT;
2 value sexfmt 1 = 'Male'
3 2 = 'Female';
NOTE: Format SEXFMT has been output.
4
5 value racefmt 1 = 'Indian'
6 2 = 'Asian'
7 3 = 'Black'
8 4 = 'White';
NOTE: Format RACEFMT has been output.
9 RUN;
NOTE: PROCEDURE FORMAT used (Total process time):
real time 0.02 seconds
cpu time 0.01 seconds
Again, in order to make the definitions for printing sex and race permanently stored beyond your current work session, you'd need to put a "LIBRARY =" option on the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this FORMAT procedure are temporary only.
All we've done so far is define the formats so that they are available for use. Now let's use them!
Example 9.6 Section
The following SAS code uses the formats to print in a meaningful way the sex and race variables contained in the back data set:
DATA temp2;
set back;
f_race=race;
f_sex=sex;
format f_race racefmt. f_sex sexfmt.;
RUN;
PROC PRINT data=temp2;
title 'Output Dataset: TEMP2';
var subj sex f_sex race f_race;
RUN;
PROC CONTENTS data=temp2;
RUN;
Obs | subj | sex | f_sex | race | f_race |
---|---|---|---|---|---|
1 | 110051 | 2 | Female | 4 | White |
2 | 110088 | 2 | Female | 4 | White |
3 | 210012 | 2 | Female | 4 | White |
4 | 220004 | 2 | Female | 4 | White |
5 | 230006 | 2 | Female | 4 | White |
6 | 310083 | 1 | Male | 2 | Asian |
7 | 410012 | 2 | Female | 4 | White |
8 | 420037 | 2 | Female | 4 | White |
9 | 510027 | 2 | Female | 4 | White |
10 | 520017 | 2 | Female | 4 | White |
Data Set name | WORK.TEMP2 | Observations | 10 |
---|---|---|---|
Member Type | DATA | Variables | 10 |
Engine | V9 | Indexes | 0 |
Created | Wed, Nov 05, 2023 11:21:06 AM | Observation Length | 80 |
Last Modified | Wed, Nov 05, 2023 11:21:06 AM | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | WINDOWS_32 | ||
Encoding | wlatin1 Western (Windows) |
Data Set Page Size | 8192 |
---|---|
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 101 |
Obs in First Data Page | 10 |
Number of Data Set Repairs | 0 |
File Name | C:\DOCUME~1\Yourdrivename~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp2.sas7bdat |
Release Created | 9.010M3 |
Host Created | XP_PRO |
# | Variable | Type | Len | Format |
---|---|---|---|---|
3 | b_date | Num | 8 | MMDDYY8. |
6 | country | Num | 8 | |
9 | f_race | Num | 8 | RACEFMT. |
10 | f_sex | Num | 8 | SEXFMT. |
7 | race | Num | 8 | |
8 | relig | Num | 8 | |
4 | sex | Num | 8 | |
5 | state | Num | 8 | |
1 | sunj | Num | 8 | |
2 | v_date | Num | 8 | MMDDTT8. |
Well, that's not precisely true! First, in creating the new data set temp2 from the back data set, two additional (numeric) variables are created, f_sex and f_race. They are equated, respectively, to the variables sex and race. Just as with SAS formats, you must associate a user-defined format with a variable in a FORMAT statement. The FORMAT statement:
format f_race racefmt. f_sex sexfmt.;
associates the f_race variable with the racefmt. format and the f_sex variable with the sexfmt. format. Again, just as is true for SAS formats, you can place the FORMAT statement in either a DATA step or a PROC step. If you place the FORMAT in a PROC step, the format is associated with the variable only for the procedure in which the association is made. If you instead place the FORMAT statement in a DATA step, the format becomes available for all subsequent procedures.
Incidentally, note that it is not necessary to create a formatted and unformatted version of the same variables as we did in this example merely for educational purposes. Creating two versions of the same variables merely helps us see the effect the formatting has on the sex and race variables.
Launch and run the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, observe the difference in the printed output between the formatted and unformatted versions of the variables f_sex and sex (and f_race and race). Also, note that the CONTENTS procedure indicates that the variables sex and race are unformatted, numeric variables (since there is no special format specified), while f_sex and f_race are formatted, numeric variables (a special format is specified).
Example 9.7 Section
The FORMAT procedure is useful in defining meaningful categories once you've converted one or more (perhaps continuous) variables into one categorical variable. The following SAS code illustrates the technique:
PROC FORMAT;
value age2fmt 1 = 'LT 20'
2 = '20-44'
3 = '45-54'
4 = 'GE 54'
OTHER = 'Missing';
RUN;
DATA temp3;
set back;
if age = . then age2 = .;
else if age lt 20 then age2 = 1;
else if age ge 20 and age lt 45 then age2 = 2;
else if age ge 45 and age lt 54 then age2 = 3;
else if age ge 54 then age2 = 4;
format age2 age2fmt.;
RUN;
PROC FREQ data=temp3;
title 'Age Frequency in TEMP3';
table age2;
RUN;
Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
PROC FORMAT; *Define the format age2fmt;
value age2fmt 1 = 'LT 20'
2 = '20-44'
3 = '45-54'
4 = 'GE 54'
OTHER = 'Missing'; *Group ages into four categories. OTHER will categorize anything left over as Missing;
RUN;
DATA temp3;
set back; *Read the back dataset into temp3;
*Create a new variable (age2);
if age = . then age2 = .; *Code for missing values;
else if age lt 20 then age2 = 1;
else if age ge 20 and age lt 45 then age2 = 2;
else if age ge 45 and age lt 54 then age2 = 3;
else if age ge 54 then age2 = 4;
format age2 age2fmt.; *Use the previously defined format when displaying age2;
RUN;
PROC FREQ data=temp3; *Create a one-way table of frequencies for age2;
title 'Age Frequency in TEMP3';
table age2;
RUN;
First, inspect the SAS program to make sure you understand the code. Then, launch and run the program and review the original data set as well as the output from the FREQ procedure to convince yourself that the age categories have been appropriately labeled. Incidentally, we'll learn more about the FREQ procedure soon in another lesson!
Example 9.8 Section
Now, as long as we are interested in grouping values of only one variable, rather than doing it as we did in the previous program, we can actually accomplish it a bit more efficiently directly within the FORMAT procedure. For example, the following SAS code uses the FORMAT procedure to define the format agefmt based on the possible values of the variable age:
PROC FORMAT;
value agefmt LOW-<20 = 'LT 20'
20-<45 = '20-44'
45-<54 = '45-54'
54-HIGH = 'GE 54'
OTHER = 'Missing';
RUN;
PROC FREQ data=back;
title 'Age Frequency in BACK';
format age agefmt.;
table age;
RUN;
age | Frequency | Percent | Cumulative Frequency | Cumulative Percent |
---|---|---|---|---|
20-44 | 5 | 50.00 | 5 | 50.00 |
45-54 | 3 | 30.00 | 8 | 80.00 |
GE 54 | 2 | 20.00 | 10 | 100.00 |
In defining groups of values right within the FORMAT procedure, note that as illustrated in this program:
- The potential ranges are defined using a dash (-). You can also list a range of values by separating the values with commas: 1,2,3 = 'Low'
- The < symbol means "not including." For example, 20-<45 means all ages between 20 and 45, including 20, but not including 45.
- The special LOW and HIGH ranges allow you to group values without knowing the smallest and largest values, respectively. (The keyword LOW does not include missing numeric values, but if applied to a character format, it does include missing character values.)
The FREQ procedure tallies the number of subjects falling within each of the age groups as defined in the FORMAT procedure. Here, the variable age is associated with the format agefmt using a FORMAT statement right within the FREQ procedure.
Now, launch and run the program and review the original data set as well as the output from the frequency procedure to convince yourself that the age categories have again been appropriately labeled.