9.2 - The INVALUE Statement

9.2 - The INVALUE Statement

The INVALUE statement in the FORMAT procedure allows you to create your own customized informats for character variables. That is, it allows you to tell SAS how you'd like the program to read in special character values. In doing so, SAS effectively translates the values of a character variable into different, typically more meaningful character or numeric values. For example, the following INVALUE statement:

INVALUE $french 'OUI'= 'YES' 
                'NON'= 'NO'; 

prepares SAS to translate a character variable in French to a character variable in English.

Restrictions on the INVALUE statement include:

  • You can only translate a character variable to another variable. You cannot translate a numeric variable using the INVALUE statement.
  • The name of the informat must begin with a $ sign, since it refers to a character variable.
  • The name of the informat (for example, french) must be a valid SAS name with no more than 30 additional characters following the imperative $ sign. The name cannot end in a number nor can the name be a standard SAS informat name.
  • When you refer to the informat later, you must follow the name with a period.

The INVALUE statement in the FORMAT procedure merely defines an informat so that it is available for use. In order for the informat to take effect, you must associate the character variable with the informat either explicitly in the INPUT statement:

INPUT resp $french.;

or in a FORMAT statement:

FORMAT resp $french.; 

Let's take a look at an example!

Example 9.3

The following SAS code illustrates the use of the FORMAT procedure to define how SAS should translate the two character variables sex and race during input:

PROC FORMAT;
  invalue $insex '1' = 'M'
                 '2' = 'F';

  invalue $inrace '1' = 'Indian'
                  '2' = 'Asian'
                  '3' = 'Black'
                  '4' = 'White';
RUN;

Because the INVALUE statement is used, the translation is restricted to taking place on input. As a result of this code, providing the character variable sex is later associated with the informat $insex, whenever SAS encounters the character value '1' for the variable sex it will instead store the character value 'M'. Similarly, whenever SAS encounters the character value '2' for the variable sex it will instead store the character value 'F'.

Launch and run  the SAS program. The only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like this:

1    PROC FORMAT;
2      invalue $insex '1' = 'M'
3                     '2' = 'F';
NOTE: Informat $INSEX has been output.
4
5      invalue $inrace '1' = 'Indian'
6                      '2' = 'Asian'
7                      '3' = 'Black'
8                      '4' = 'White';
NOTE: Informat $INRACE has been output.
9    RUN;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds

As we'll learn later in this lesson, in order to make the definitions for reading in sex and race permanently stored beyond our current work session, we'd need to attach a "LIBRARY =" option to the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this format procedure are temporary only. That is, they are not stored beyond your current SAS session.

All we've done so far is define the informats so that they are available for use. Now let's use them!

Example 9.4

The following data step uses the informats that we defined in the previous example to read in a subset of the data from the input raw data file back.dat:

DATA temp1;
  infile 'C:\Yourdrivename\icdb\data\back.dat';
  length sex $ 1 race $ 6;
  input subj 1-6 @17 sex $insex1. @19 race $inrace1.;
RUN;
PROC CONTENTS data=temp1;
  title 'Output Dataset: TEMP1';
RUN;
PROC PRINT data=temp1;
  var subj sex race;
RUN;

Output Dataset: TEMP1
The CONTENTS Procedure

Data Set name

WORK.TEMP1

Observations

10

Member Type

DATA

Variables

3

Engine

V9

Indexes

0

Created

Wed, Nov 05, 2023 11:06:38 AM

Observation Length

16

Last Modified

Wed, Nov 05, 2023 11:06:38 AM

Deleted Observations

0

Protection

 

Compressed

NO

Data Set Type

 

Sorted

NO

Label

   

Data Representation

WINDOWS_32

  

Encoding

wlatin1 Western (Windows)

  
Engine/Host Dependent Information

Data Set Page Size

4096

Number of Data Set Pages

1

First Data Page

1

Max Obs per Page

252

Obs in First Data Page

10

Number of Data Set Repairs

0

File Name

C:\DOCUME~1\Yourdrivename~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp1.sas7bdat

Release Created

9.010M3

Host Created

XP_PRO

Alphabetic List of Variables and Attributes

#

Variable

Type

Len

2

race

Char

6

1

sex

Char

1

3

subj

Num

8

Output Dataset: TEMP1

Obs

subj

sex

race

1

110051

F

White

2

110088

F

White

3

210012

F

White

4

220004

F

White

5

230006

F

White

6

310083

M

Asian

7

410012

F

White

8

420037

F

White

9

510027

F

White

10

520017

F

White

Only a subset of the variables in the back.dat data file is read. Column numbers ("1-6") are used to read the variable subj, and absolute pointer controls are used to read the variables sex ("@17") and race ("@19") from the file. Note that:

  • Because we want to translate the variables, we must read sex and race as character variables, even though they are numbers.
  • On input, we have the option of specifying the length of the variables being read in. The length of the variables is specified in the informat name between the name and the period. For example, the length of the variable race being read in is defined as 1 in the informat $inrace1.
  • The LENGTH statement defines the length of sex and race after translation.

Launch the SAS program. Then, edit the INFILE statement so that it reflects the location of your stored back.dat file. Then, run  the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, note that the variables sex and race are both character variables, as indicated by "Char" appearing under the Type column in the output from the CONTENTS procedure. Also, note that the contents procedure gives no indication that the variables sex and race are formatted in any particular way for output. We'd have to take care of that by using a VALUE statement (as opposed to an INVALUE statement)!

Finally, as a little sidebar, recall that the TITLE statement is a toggle statement. That is, its value remains in effect until it is changed with another TITLE statement. Therefore, the title in the PRINT procedure is the same that is used in the CONTENTS procedure.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility