The INVALUE statement in the FORMAT procedure allows you to create your own customized informats for character variables. That is, it allows you to tell SAS how you'd like the program to read in special character values. In doing so, SAS effectively translates the values of a character variable into different, typically more meaningful character or numeric values. For example, the following INVALUE statement:
INVALUE $french 'OUI'= 'YES'
'NON'= 'NO';
prepares SAS to translate a character variable in French to a character variable in English.
Restrictions on the INVALUE statement include:
- You can only translate a character variable to another variable. You cannot translate a numeric variable using the INVALUE statement.
- The name of the informat must begin with a $ sign, since it refers to a character variable.
- The name of the informat (for example, french) must be a valid SAS name with no more than 30 additional characters following the imperative $ sign. The name cannot end in a number nor can the name be a standard SAS informat name.
- When you refer to the informat later, you must follow the name with a period.
The INVALUE statement in the FORMAT procedure merely defines an informat so that it is available for use. In order for the informat to take effect, you must associate the character variable with the informat either explicitly in the INPUT statement:
INPUT resp $french.;
or in a FORMAT statement:
FORMAT resp $french.;
Let's take a look at an example!
Example 9.3 Section
The following SAS code illustrates the use of the FORMAT procedure to define how SAS should translate the two character variables sex and race during input:
PROC FORMAT;
invalue $insex '1' = 'M'
'2' = 'F';
invalue $inrace '1' = 'Indian'
'2' = 'Asian'
'3' = 'Black'
'4' = 'White';
RUN;
Because the INVALUE statement is used, the translation is restricted to taking place on input. As a result of this code, providing the character variable sex is later associated with the informat $insex, whenever SAS encounters the character value '1' for the variable sex it will instead store the character value 'M'. Similarly, whenever SAS encounters the character value '2' for the variable sex it will instead store the character value 'F'.
Launch and run the SAS program. The only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like this:
As we'll learn later in this lesson, in order to make the definitions for reading in sex and race permanently stored beyond our current work session, we'd need to attach a "LIBRARY =" option to the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this format procedure are temporary only. That is, they are not stored beyond your current SAS session.
All we've done so far is define the informats so that they are available for use. Now let's use them!
Example 9.4 Section
The following data step uses the informats that we defined in the previous example to read in a subset of the data from the input raw data file back.dat:
DATA temp1;
infile 'C:\simon\icdb\data\back.dat';
length sex $ 1 race $ 6;
input subj 1-6 @17 sex $insex1. @19 race $inrace1.;
RUN;
PROC CONTENTS data=temp1;
title 'Output Dataset: TEMP1';
RUN;
PROC PRINT data=temp1;
var subj sex race;
RUN;
Output Dataset: TEMP1
The CONTENTS Procedure
Data Set name | WORK.TEMP1 | Observations | 10 |
Member Type | DATA | Variables | 3 |
Engine | V9 | Indexes | 0 |
Created | Wed, Nov 05, 2008 11:06:38 AM | Observation Length | 16 |
Last Modified | Wed, Nov 05, 2008 11:06:38 AM | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | WINDOWS_32 | ||
Encoding | wlatin1 Western (Windows) |
Engine/Host Dependent Information
Data Set Page Size | 4096 |
Number of Data Set Pages | 1 |
First Data Page | 1 |
Max Obs per Page | 252 |
Obs in First Data Page | 10 |
Number of Data Set Repairs | 0 |
File Name | C:\DOCUME~1\LAURAJ~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp1.sas7bdat |
Release Created | 9.010M3 |
Host Created | XP_PRO |
Alphabetic List of Variables and Attributes
# | Variable | Type | Len |
---|---|---|---|
2 | race | Char | 6 |
1 | sex | Char | 1 |
3 | subj | Num | 8 |
Output Dataset: TEMP1
Obs | subj | sex | race |
---|---|---|---|
1 | 110051 | F | White |
2 | 110088 | F | White |
3 | 210012 | F | White |
4 | 220004 | F | White |
5 | 230006 | F | White |
6 | 310083 | M | Asian |
7 | 410012 | F | White |
8 | 420037 | F | White |
9 | 510027 | F | White |
10 | 520017 | F | White |
Only a subset of the variables in the back.dat data file is read. Column numbers ("1-6") are used to read the variable subj, and absolute pointer controls are used to read the variables sex ("@17") and race ("@19") from the file. Note that:
- Because we want to translate the variables, we must read sex and race as character variables, even though they are numbers.
- On input, we have the option of specifying the length of the variables being read in. The length of the variables is specified in the informat name between the name and the period. For example, the length of the variable race being read in is defined as 1 in the informat $inrace1.
- The LENGTH statement defines the length of sex and race after translation.
Launch the SAS program. Then, edit the INFILE statement so that it reflects the location of your stored back.dat file. Then, run the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, note that the variables sex and race are both character variables, as indicated by "Char" appearing under the Type column in the output from the CONTENTS procedure. Also, note that the contents procedure gives no indication that the variables sex and race are formatted in any particular way for output. We'd have to take care of that by using a VALUE statement (as opposed to an INVALUE statement)!
Finally, as a little sidebar, recall that the TITLE statement is a toggle statement. That is, its value remains in effect until it is changed with another TITLE statement. Therefore, the title in the PRINT procedure is the same that is used in the CONTENTS procedure.