Lesson 9: The Format Procedure

Lesson 9: The Format Procedure

Overview

We previously learned how to use a FORMAT statement to tell SAS to display certain variable values in a particular way. For example, we might tell SAS to display a date variable saledate, say, using the SAS mmddyy10. format, so that August 19, 2008 is displayed as 08/19/2008. There are a whole slew of informats and formats that SAS provides that you can find in the SAS Help and Documentation. Our focus in this lesson will be on creating informats and formats to fill in for those that SAS doesn't provide.

That is, in this lesson, we will extend our formatting capabilities by investigating how to create user-defined informats and formats using the FORMAT procedure. In particular, we will take a look at the following techniques:

  • how to translate values of a character variable when they are read in a SAS data set into more meaningful values using the INVALUE statement
  • how to create customized formats for character and numeric variables, using the VALUE statement, so variables can be printed in a meaningful format
  • how to create templates, using the PICTURE statement, for printing numbers with a special format, such as leading zeros, decimal and comma punctuation, fill characters, and prefixes.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • use a PUT statement to create an ascii file from a SAS data set
  • use an INVALUE statement in the FORMAT procedure to tell SAS how to read in special characters
  • associate a character variable with a particular informat
  • create a permanent formats catalog for a SAS data set
  • use a VALUE statement in the FORMAT procedure to tell SAS how to print certain variables
  • use a format to define categories
  • use the NOFMTERR option to use a data set with permanently formatted variables without having access to the formats catalog
  • use the %INCLUDE statement to include a secondary program file in a main SAS program
  • use a codebook to help define formats
  • use a PICTURE format to to create templates for printing numbers by defining a format that allows for special characters
  • use the FMTLIB option of the FORMAT procedure to tell SAS to print the contents of a formats catalog.

9.1 - A Working Data Set

9.1 - A Working Data Set

Throughout this lesson, we will investigate a number of examples that illustrate how to create different informats and formats for several different variables. To do so, we will use a subset of the demographic (or "background") data collected on 638 subjects once enrolled in the National Institute of Health's Interstitial Cystitis Data Base (ICDB) Study. Not surprisingly, the ICDB Study collected data on people who were diagnosed as having interstitial cystitis! The primary reason for conducting the study was that interstitial cystitis is a poorly understood condition that causes severe bladder and pelvic pain, urinary frequency, and painful urination in the absence of any identifiable cause. Although the disease is more prevalent in women, it affects both men and women of all ages. For the ICDB Study, each subject was enrolled at one of seven clinical centers and was evaluated four times a year for as many as four years.

It will probably be helpful for you to take a peek at the background data form on which the data were collected. In order to run the SAS programs in this lesson, you'll need to save the background data set to a directory on your computer. To do so, right-click the link and select the "Save Link As" option. A Save dialog box will appear and allow you to save the data file to the location you choose on your computer.

Example 9.1

Because there are 638 observations and 16 variables in the permanent background data set icdb.back, the data on just ten subjects and nine variables are selected when creating the temporary working background data set back. The following SAS program creates the subset:


OPTIONS PS = 58 LS = 80 NODATE NONUMBER;
LIBNAME icdb 'C:\Simon\icdb\data';

DATA back;
   set icdb.back;
   age = (v_date - b_date)/365.25;
   if subj in (110051, 110088, 210012, 220004, 230006,
               310083, 410012, 420037, 510027, 520017);
   keep subj v_date b_date age sex state country race relig;
   format age 4.1;
RUN;

PROC PRINT;
  title 'Output Dataset: BACK';
RUN;
Output Dataset: BACK
Obs subj v_date b_date sex state country race relig age
1 110051 1/25/94 12/02/42 2 42 1 4 3 51.1
2 110088 2/28/95 10/03/27 2 23 1 4 2 67.4
3 210012 07/16/93 06/27/24 2 . 6 4 1 69.1
4 220004 07/27/93 08/07/72 2 38 1 4 1 21.0
5 230006 01/06/94 04/24/49 2 21 1 4 3 44.7
6 310083 01/20/95 05/13/54 1 . 17 2 1 40.7
7 410012 09/16/93 11/01/47 2 22 1 4 3 45.9
8 420037 02/02/94 0/25/41 2 22 1 4 1 52.5
9 510027 02/15/94 08/16/63 2 49 1 4 1 30.5
10 520017 11/17/93 09/24/54 2 14 1 4 1 39.1

Launch the SAS program. Then, edit the LIBNAME statement so it reflects the location where you saved the background data set. Then, run the program and review the contents of the print procedure to familiarize yourself with the structure and contents of the subset data set called back.

Note that the IF statement tells SAS which ten subjects we want included in the back data set. And, the KEEP statement tells SAS which nine variables we want included in the back data set. We will learn more about the KEEP statement in Stat 481. You might also want to note that the FORMAT statement tells SAS to use the SAS-provided w.d format to display an age as 44.7, say.

Example 9.2

We'll also need to work with an raw data file version of the subset data set. The following SAS code creates the ascii raw data file, in column format, from the temporary back data set:

DATA _NULL_;
  set back;
  file 'C:\simon\icdb\data\back.dat';
  put subj 1-6 @8 b_date mmddyy8. sex 17 race 19 
      relig 21 state 23-24 country 26-27 
      @29 age 4.1 @34 v_date mmddyy8.;
RUN;

The SAS data set name _NULL_ tells SAS to execute the DATA step as if it were creating a new SAS data set, but no observations and no variables are written to an output data set. The PUT statement tells SAS to write the variables — in the format specified — to the filename specified (back.dat) in the FILE statement. The specifications used in the PUT statement are similar to the specifications used in the INPUT statement.

Launch the SAS program. Then, edit the FILE statement so it reflects the location where you would like the raw data file saved. Then, run the program. Open the newly created back.dat file in an ascii editor, such as NotePad, to convince yourself that its structure and contents are similar to the back data set.


9.2 - The INVALUE Statement

9.2 - The INVALUE Statement

The INVALUE statement in the FORMAT procedure allows you to create your own customized informats for character variables. That is, it allows you to tell SAS how you'd like the program to read in special character values. In doing so, SAS effectively translates the values of a character variable into different, typically more meaningful character or numeric values. For example, the following INVALUE statement:

INVALUE $french 'OUI'= 'YES' 
                'NON'= 'NO'; 

prepares SAS to translate a character variable in French to a character variable in English.

Restrictions on the INVALUE statement include:

  • You can only translate a character variable to another variable. You cannot translate a numeric variable using the INVALUE statement.
  • The name of the informat must begin with a $ sign, since it refers to a character variable.
  • The name of the informat (for example, french) must be a valid SAS name with no more than 30 additional characters following the imperative $ sign. The name cannot end in a number nor can the name be a standard SAS informat name.
  • When you refer to the informat later, you must follow the name with a period.

The INVALUE statement in the FORMAT procedure merely defines an informat so that it is available for use. In order for the informat to take effect, you must associate the character variable with the informat either explicitly in the INPUT statement:

INPUT resp $french.; 

or in a FORMAT statement:

FORMAT resp $french.; 

Let's take a look at an example!

Example 9.3

The following SAS code illustrates the use of the FORMAT procedure to define how SAS should translate the two character variables sex and race during input:

PROC FORMAT;
  invalue $insex '1' = 'M'
                 '2' = 'F';

  invalue $inrace '1' = 'Indian'
                  '2' = 'Asian'
                  '3' = 'Black'
                  '4' = 'White';
RUN;

Because the INVALUE statement is used, the translation is restricted to taking place on input. As a result of this code, providing the character variable sex is later associated with the informat $insex, whenever SAS encounters the character value '1' for the variable sex it will instead store the character value 'M'. Similarly, whenever SAS encounters the character value '2' for the variable sex it will instead store the character value 'F'.

Launch and run the SAS program. The only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like this:

As we'll learn later in this lesson, in order to make the definitions for reading in sex and race permanently stored beyond our current work session, we'd need to attach a "LIBRARY =" option to the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this format procedure are temporary only. That is, they are not stored beyond your current SAS session.

All we've done so far is define the informats so that they are available for use. Now let's use them!

Example 9.4

The following data step uses the informats that we defined in the previous example to read in a subset of the data from the input raw data file back.dat:

DATA temp1;
  infile 'C:\simon\icdb\data\back.dat';
  length sex $ 1 race $ 6;
  input subj 1-6 @17 sex $insex1. @19 race $inrace1.;
RUN;

PROC CONTENTS data=temp1;
  title 'Output Dataset: TEMP1';
RUN;

PROC PRINT data=temp1;
  var subj sex race;
RUN;
Output Dataset: TEMP1
The CONTENTS Procedure
Data Set name WORK.TEMP1 Observations 10
Member Type DATA Variables 3
Engine V9 Indexes 0
Created Wed, Nov 05, 2008 11:06:38 AM Observation Length 16
Last Modified Wed, Nov 05, 2008 11:06:38 AM Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_32    
Encoding wlatin1 Western (Windows)    
Engine/Host Dependent Information
Data Set Page Size 4096
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 252
Obs in First Data Page 10
Number of Data Set Repairs 0
File Name C:\DOCUME~1\LAURAJ~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp1.sas7bdat
Release Created 9.010M3
Host Created XP_PRO
Alphabetic List of Variables and Attributes
# Variable Type Len
2 race Char 6
1 sex Char 1
3 subj Num 8
Output Dataset: TEMP1
Obs subj sex race
1 110051 F White
2 110088 F White
3 210012 F White
4 220004 F White
5 230006 F White
6 310083 M Asian
7 410012 F White
8 420037 F White
9 510027 F White
10 520017 F White

Only a subset of the variables in the back.dat data file is read. Column numbers ("1-6") are used to read the variable subj, and absolute pointer controls are used to read the variables sex ("@17") and race ("@19") from the file. Note that:

  • Because we want to translate the variables, we must read sex and race as character variables, even though they are numbers.
  • On input, we have the option of specifying the length of the variables being read in. The length of the variables is specified in the informat name between the name and the period. For example, the length of the variable race being read in is defined as 1 in the informat $inrace1.
  • The LENGTH statement defines the length of sex and race after translation.

Launch the SAS program. Then, edit the INFILE statement so that it reflects the location of your stored back.dat file. Then, run the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, note that the variables sex and race are both character variables, as indicated by "Char" appearing under the Type column in the output from the CONTENTS procedure. Also, note that the contents procedure gives no indication that the variables sex and race are formatted in any particular way for output. We'd have to take care of that by using a VALUE statement (as opposed to an INVALUE statement)!

Finally, as a little sidebar, recall that the TITLE statement is a toggle statement. That is, its value remains in effect until it is changed with another TITLE statement. Therefore, the title in the PRINT procedure is the same that is used in the CONTENTS procedure.


9.3 - The VALUE Statement

9.3 - The VALUE Statement

The INVALUE statement in the FORMAT procedure allows you to create your own customized informats, so that variables can be read in meaningful ways, whereas the VALUE statement allows you to create your own customized formats, so that variables can be displayed in meaningful ways. Customized formats do not alter variable types; they merely tell SAS to print variables according to your customized definitions. For example, providing the numeric variable sex is associated with the format sexfmt that is defined in the following VALUE statement:

VALUE sexfmt 1 = 'Male'
             2 = 'Female';

SAS will print "Male" when the variable sex = 1 and "Female" when sex = 2. The variable type of sex remains numeric. Restrictions on the VALUE statement include:

  • The name of the format for numeric variables (for example, sexfmt) must be a valid SAS name up to 32 characters, not ending in a number.
  • The name of the format for a character variable must begin with a $ sign, and have no more than 31 additional characters.
  • When you define the format in the VALUE statement, the format name cannot end in a period.
  • But when you use the format later, you must follow the name with a period. (Is this confusing or what?)
  • The maximum length for a format label is 32,767 characters (ehhhhh...?)

Just as is true for the INVALUE statement, the VALUE statement in the FORMAT procedure merely defines a format. In order for the format to take effect, you must associate the variable with the format you've defined by using a FORMAT statement in either a DATA step or a PROC step.

Example 9.5

The following FORMAT procedure defines how SAS should display numeric variables associated with the two formats sexfmt and racefmt during output:

PROC FORMAT;
  value sexfmt 1 = 'Male'
               2 = 'Female';

  value racefmt 1 = 'Indian'
                2 = 'Asian'
                3 = 'Black'
                4 = 'White';
RUN;

The translation is restricted to taking place on output, since the VALUE statement is used. As a result of this code, providing the numeric variable sex is later associated with the format sexfmt, whenever SAS goes to print the numeric value 1 for the variable sex, it will instead print the character value 'Male'. Similarly, whenever SAS goes to print the numeric value 2 for the variable sex, it will instead print the character value 'Female'.

Launch and run the SAS program. Again, the only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like this:

Again, in order to make the definitions for printing sex and race permanently stored beyond your current work session, you'd need to put a "LIBRARY =" option on the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this FORMAT procedure are temporary only.

All we've done so far is define the formats so that they are available for use. Now let's use them!

Example 9.6

The following SAS code uses the formats to print in a meaningful way the sex and race variables contained in the back data set:

DATA temp2;
   set back;
   f_race=race; 
   f_sex=sex;
   format f_race racefmt. f_sex sexfmt.;
RUN;

PROC PRINT data=temp2;
  title 'Output Dataset: TEMP2';
  var subj sex f_sex race f_race;
RUN;

PROC CONTENTS data=temp2;
RUN;
Output Dataset: TEMP2
Obs subj sex f_sex race f_race
1 110051 2 Female 4 White
2 110088 2 Female 4 White
3 210012 2 Female 4 White
4 220004 2 Female 4 White
5 230006 2 Female 4 White
6 310083 1 Male 2 Asian
7 410012 2 Female 4 White
8 420037 2 Female 4 White
9 510027 2 Female 4 White
10 520017 2 Female 4 White
Output Dataset: TEMP2
The CONTENTS Procedure
Data Set name WORK.TEMP2 Observations 10
Member Type DATA Variables 10
Engine V9 Indexes 0
Created Wed, Nov 05, 2008 11:21:06 AM Observation Length 80
Last Modified Wed, Nov 05, 2008 11:21:06 AM Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_32    
Encoding wlatin1 Western (Windows)    
Engine/Host Dependent Information
Data Set Page Size 8192
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 101
Obs in First Data Page 10
Number of Data Set Repairs 0
File Name C:\DOCUME~1\LAURAJ~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp2.sas7bdat
Release Created 9.010M3
Host Created XP_PRO

 

Alphabetic List of Variables and Attributes
# Variable Type Len Format
3 b_date Num 8 MMDDYY8.
6 country Num 8  
9 f_race Num 8 RACEFMT.
10 f_sex Num 8 SEXFMT.
7 race Num 8  
8 relig Num 8  
4 sex Num 8  
5 state Num 8  
1 sunj Num 8  
2 v_date Num 8 MMDDTT8.

Well, that's not precisely true! First, in creating the new data set temp2 from the back data set, two additional (numeric) variables are created, f_sex and f_race. They are equated, respectively, to the variables sex and race. Just as with SAS formats, you must associate a user-defined format with a variable in a FORMAT statement. The FORMAT statement:

 format f_race racefmt. f_sex sexfmt.;

associates the f_race variable with the racefmt. format and the f_sex variable with the sexfmt. format. Again, just as is true for SAS formats, you can place the FORMAT statement in either a DATA step or a PROC step. If you place the FORMAT in a PROC step, the format is associated with the variable only for the procedure in which the association is made. If you instead place the FORMAT statement in a DATA step, the format becomes available for all subsequent procedures.

Incidentally, note that it is not necessary to create a formatted and unformatted version of the same variables as we did in this example merely for educational purposes. Creating two versions of the same variables merely helps us see the effect the formatting has on the sex and race variables.

Launch and run the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, observe the difference in the printed output between the formatted and unformatted versions of the variables f_sex and sex (and f_race and race). Also, note that the CONTENTS procedure indicates that the variables sex and race are unformatted, numeric variables (since there is no special format specified), while f_sex and f_race are formatted, numeric variables (a special format is specified).

Example 9.7

The FORMAT procedure is useful in defining meaningful categories once you've converted one or more (perhaps continuous) variables into one categorical variable. The following SAS code illustrates the technique:

 PROC FORMAT;
    value age2fmt 1 = 'LT 20'
                  2 = '20-44'
                  3 = '45-54'
                  4 = 'GE 54'
                  OTHER = 'Missing';
 RUN;

 DATA temp3;
   set back;
        if age = . then age2 = .;
   else if age lt 20 then age2 = 1;
   else if age ge 20 and age lt 45 then age2 = 2;
   else if age ge 45 and age lt 54 then age2 = 3;
   else if age ge 54 then age2 = 4;
   format age2 age2fmt.;
 RUN;

 PROC FREQ data=temp3;
   title 'Age Frequency in TEMP3';
   table age2;
 RUN;

First, inspect the SAS program to make sure you understand the code. Then, launch and run the program and review the original data set as well as the output from the FREQ procedure to convince yourself that the age categories have been appropriately labeled. Incidentally, we'll learn more about the FREQ procedure soon in another lesson!

Example 9.8

Now, as long as we are interested in grouping values of only one variable, rather than doing it as we did in the previous program, we can actually accomplish it a bit more efficiently directly within the FORMAT procedure. For example, the following SAS code uses the FORMAT procedure to define the format agefmt based on the possible values of the variable age:

 PROC FORMAT;
   value agefmt LOW-<20  = 'LT 20'
                20-<45  = '20-44'
                45-<54  = '45-54'
                54-HIGH = 'GE 54'
                OTHER   = 'Missing';
 RUN;


 PROC FREQ data=back;
   title 'Age Frequency in BACK';
   format age agefmt.;
   table age;
 RUN;
Age Frequency in BACK
The FREQ Procedure
age Frequency Percent Cumulative Frequency Cumulative Percent
20-44 5 50.00 5 50.00
45-54 3 30.00 8 80.00
GE 54 2 20.00 10 100.00

In defining groups of values right within the FORMAT procedure, note that as illustrated in this program:

  • The potential ranges are defined using a dash (-). You can also list a range of values by separating the values with commas: 1,2,3 = 'Low'
  • The < symbol means "not including." Therefore, here for example, 20-<45 means all ages between 20 and 45, including 20, but not including 45.
  • The special LOW and HIGH ranges allow you to group values without knowing the smallest and largest values, respectively. (The keyword LOW does not include missing numeric values, but if applied to a character format, it does include missing character values.)

The FREQ procedure tallies the number of subjects falling within each of the age groups as defined in the FORMAT procedure. Here, the variable age is associated with the format agefmt using a FORMAT statement right within the FREQ procedure.

Now, launch and run the program and review the original data set as well as the output from the frequency procedure to convince yourself that the age categories have again been appropriately labeled.


9.4 - Permanent Formats

9.4 - Permanent Formats

All of the customized informat and format definitions in this lesson thus far have been stored only temporarily. That is, the informats and formats are valid only for the duration of the SAS session in which they are defined. If you wanted to use the informats or formats again in a different SAS program, you would have to create them again using another FORMAT procedure. If you plan to use a customized informat or format repeatedly, you can store it permanently in a "formats catalog" by using the LIBRARY= option in the PROC FORMAT statement. Basically, the LIBRARY= option tells SAS where the formats catalog is (to be) stored. You tell SAS the library (which again you can think of as a directory or location) by using a LIBNAME statement:

LIBNAME libref 'c:\directory\where\formats\stored';

where libref is technically a name of your choosing. Note though that when a user-defined informat or format is called by a DATA or PROC step, SAS first looks in a temporary catalog named work.formats. (Recall that "work" is what SAS always treats as your temporary working library that goes away at the end of your SAS session.) If SAS does not find the format or informat in the temporary catalog, it then by default looks in a permanent catalog called library.formats. So, while, yes, libref is technically a name of your choosing, it behooves you to call it library since that what SAS looks for first. That's why SAS recommends, but does not require, that you use the word library as the libref when creating permanent formats.

To make this blather a bit more concrete, suppose we have the following LIBNAME statement in our SAS program:

LIBNAME library 'C:\Simon\Stat480WCDEV\08format\sasndata\';

and have a format procedure that starts with:

PROC FORMAT library=library;

Then, upon running the program, SAS creates a permanent catalog containing all of the formats and informats that are defined in the FORMAT procedure and stores it in the folder referenced above, as illustrated here:

A formats catalog, regardless of whether it is temporary (work.formats) or permanent (library.formats), contains one entry for each format or informat defined in a FORMAT procedure. Because library.formats is the reserved name for permanent formats catalogs, you can create only one catalog called formats per SAS library (directory). There are ways around this restriction, but let's not get into that now. Let's jump to an example instead.

Example 9.9

The following SAS program illustrates a FORMAT procedure that creates a permanent formats catalog in the directory referenced by library, that is, in C:\simon\icdb\data:

LIBNAME library 'C:\simon\icdb\data';

PROC FORMAT library=library;
   value sex2fmt 1 = 'Male'
                 2 = 'Female';

   value race2fmt 3 = 'Black'
                  4 = 'White'
                  OTHER = 'Other';
 RUN;

 DATA temp4; 
   infile 'c:\simon\icdb\data\back.dat';
   input subj 1-6 sex 17 race 19;
   format sex sex2fmt. race race2fmt.;
 RUN;

 PROC CONTENTS data=temp4;
   title 'Output Dataset: TEMP4';
 RUN;

 PROC PRINT data=temp4;
 RUN;
Output Dataset: TEMP4
The CONTENTS Procedure
Data Set name WORK.TEMP4 Observations 10
Member Type DATA Variables 3
Engine V9 Indexes 0
Created Wed, Nov 05, 2008 12:05:29 PM Observation Length 24
Last Modified Wed, Nov 05, 2008 12:05:29 PM Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_32    
Encoding wlatin1 Western (Windows)    
Engine/Host Dependent Information
Data Set Page Size 4096
Number of Data Set Pages 1
First Data Page 1
Max Obs per Page 168
Obs in First Data Page 10
Number of Data Set Repairs 0
File Name C:\DOCUME~1\LAURAJ~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp4.sas7bdat
Release Created 9.010M3
Host Created XP_PRO
Alphabetic List of Variables and Attributes
# Variable Type Len Format
3 race Num 8 RACE2FMT.
2 sex Num 8 SEX2FMT.
1 v_date Num 8  
Output Dataset: TEMP4
Obs subj sex race
1 110051 Female White
2 110088 Female White
3 210012 Female White
4 220004 Female White
5 230006 Female White
6 310083 Male Other
7 410012 Female White
8 420037 Female White
9 510027 Female White
10 520017 Female White

The DATA step creates a temporary data set called temp4 by reading in the variables subj, sex, and race from the raw data file back.dat, and associates the variables sex and race, respectively, with the formats sex2fmt and race2fmt that are defined in the FORMAT procedure. SAS first looks for the occurrence of these two formats in the temporary catalog work.formats and then when it doesn't find them there, it looks for them in the catalog of the permanent format in the c:\simon\icdb\data directory.

Launch the SAS program, and edit the INFILE statement so it reflects the location of your back.dat file. And, edit the LIBNAME statement so it reflects your desired location for the catalog of the permanent format. Then, run the program and review the output from the CONTENTS and PRINT procedures to convince yourself that the variables sex and race are associated with the permanent formats sex2fmt and race2fmt, not the temporary formats sexfmt and racefmt previously associated with f_sex and f_race. Also, view the directory referenced in your LIBNAME statement to convince yourself that SAS created and stored a permanent formats catalog there.

Just a few more comments on this permanent formats stuff. One of the problems with permanent informats and formats is that once a variable has been associated permanently with an informat or format, SAS must be able to refer to the library to access the formats catalog. As long as the formats catalog exists, and you have permission to the file, you just have to specify the appropriate LIBNAME statement:

LIBNAME library 'c:\stat480\formats\'; 

to access the catalog. If for some reason, you do not have access to the formats catalog, SAS will give you an error that looks something like this:

If you specify the NOFMTERR in the OPTIONS statement:

OPTIONS NOFMTERR; 

you can use the SAS data sets without getting errors. SAS will just display a note (not a program-halting error!) in the log file:

You will be able to run SAS programs that use the data sets containing the permanent formats. You will just not have access to the formats.

Example 9.10

Rather than creating a permanent formats catalog, you can create a SAS program file which contains only a FORMAT procedure with the desired value and invalue statements. Then you need merely include this secondary program file in your main SAS program using the %INCLUDE statement, as illustrated here:

%INCLUDE 'C:\simon\icdb\formats\backfmt.sas';

 PROC FREQ data=back;
   title 'Frequency Count of STATE (statefmt)';
   format state statefmt.;
   table state/missing;
 RUN;
Frequency Count of STATE (statefmt)
The FREQ Procedure
state Frequency Percent Cumulative Frequency Cumulative Percent
Missing 2 20.00 2 20.00
Ind 1 10.00 3 30.00
Mass 1 10.00 4 40.00
Mich 2 20.00 6 60.00
Minn 1 10.00 7 70.00
Other 1 10.00 8 80.00
Tenn 1 10.00 9 90.00
Wisc 1 10.00 10 100.00

To make it clear, here's the only thing contained in the backfmt.sas file:

PROC FORMAT;
    value statefmt 14 = 'Ind'
                   21 = 'Mass'
                   22 = 'Mich'
                   23 = 'Minn'
                   42 = 'Tenn'
                   49 = 'Wisc'
                   .  = 'Missing'
                Other = 'Other';

Since the FORMAT procedure in the backfmt.sas file does not refer to a permanent library, the format statefmt is stored in the temporary work.formats catalog.

To run this program, first download and save the backfmt.sas file to a convenient location on your computer. Then, launch the SAS program and edit the %INCLUDE statement so it reflects the location of your backfmt.sas file. Finally, run the program and review the output from the FREQ procedure. Convince yourself that the format statement in the FREQ procedure appropriately associates the state variable with the statefmt format created by the FORMAT procedure in backfmt.sas. You may as well also take note of the effect of the MISSING option in the FREQ procedure. Basically, it tells SAS to include missing values as a countable category.

The technique illustrated in this example is particularly useful when you work in an open environment, in which data sets are shared. Different users may not have access to the format file, or different users may prefer different formats.


9.5 - Using Codebooks to Help Define Formats

9.5 - Using Codebooks to Help Define Formats

It is very common for discrete (categorical) variables to have many, many (hundreds, perhaps even thousands of) possible values. Examples include:

  • diseases may be coded by an integer
  • surgical treatments may be coded by an integer
  • medications may be coded by an integer

An electronic "codebook" is typically used to keep track of the meaning of each of the integer codes. That is, codebooks contain two variables, the code and a text description of the code. For example, disease 1124 could be defined in text as "Rheumatoid Arthritis."

One would find it extremely tedious and time-consuming to have to type a FORMAT procedure which re-defines the codes and text contained in these codebooks. Instead, one can take advantage of the fact that the codebook is already in an electronic format, such as a database table, an ascii file, or even a SAS data set.

When the codebook is contained in a SAS data set with three required variables:

  • start: the variable that contains the starting range values (that is, the codes)
  • label: the variable that contains the text definition
  • fmtname: the format name

SAS can create the appropriate format using the CNTLIN = option in the PROC FORMAT statement. Let's take a look at an example.

Example 9.11

The following SAS program creates a SAS data set called states from state_cd, which is the codebook for the variable state that is collected on the ICDB background form. Here's what the first ten observations of the state_cd data set look like:

Codebook for States
Obs name CODE
1 alabama 1
2 alaska 2
3 arizona 3
4 arkansas 4
5 california 5
6 colorado 6
7 connecticuit 7
8 delaware 8
9 florida 9
10 georgia 10

The data set states is then used in the FORMAT procedure to define the format for the variable state in the back data set:

PROC PRINT data=icdb.state_cd;
   title 'Codebook for States';
 RUN;

 DATA states;
   set icdb.state_cd (rename = (code = start name=label));
   fmtname = 'stat2fmt';
 RUN;

 PROC FORMAT cntlin=states;
 RUN;

 PROC FREQ data=back;
   title 'Freqency Count of STATE (stat2fmt)';
   format state stat2fmt.;
   table state;
 RUN;
Frequency count of STATE (stat2fmt)
The FREQ Procedure
state Frequency Percent Cumulative Frequency Cumulative percent
indiana 1 12.50 1 12.50
massachusetts 1 12.50 2 25.00
michigan 2 25.00 4 50.00
minnesota 1 12.50 5 62.50
pennsylvania 1 12.50 6 75.00
tennessee 1 12.50 7 87.50
wisconsin 1 12.50 8 100.00

Frequency Missing = 2

Before running this program, you'll have to download the codebook state_cd. Save it to the location on your computer that you referenced in the earlier LIBNAME statement by the libref icdb. Then, go ahead and launch and run the program.

As you can see from the output, the PRINT procedure merely prints the icdb.state_cd codebook for your review. You should notice that the variable names in state_cd do not meet SAS requirements for codebooks. Therefore, the DATA step that creates the data set states merely renames the code and name variables in icdb.state_cd so that they meet SAS requirements. The RENAME= option on the SET statement is what is used to change the code variable to start and the name variable to label. The general syntax of the RENAME= option on the SET statement is:

set dsname (rename = (oldvr1 = newvr1 oldvr2 = newvr2 ...));

An assignment statement is then used to assign the value stat2fmt to the variable fmtname for each observation (that is, code) that appears in the icdb.state_cd data set.

Then, the FORMAT procedure with the CNTLIN = states option tells SAS to create the format stat2fmt based on the contents of the data set states. Finally, the FREQ procedure illustrates the use of the stat2fmt after it was created in this manner. SAS merely counts and reports the number of subjects coming from each of the states. Note that since we didn't include the MISSING option on the TABLE statement, SAS reports the number of missing values after the table, rather than as a row of the table.


9.6 - The PICTURE Statement

9.6 - The PICTURE Statement

I've always thought the FORMAT procedure's PICTURE statement is one of the procedure's coolest features. In short, the PICTURE statement allows you to create templates for printing numbers by defining a format that allows for special characters, such as leading zeros, decimal and comma punctuation, fill characters, prefixes and negative number representation. Only numeric variables can have picture formats. For a quick example, the following PICTURE statement:

PICTURE phonepix OTHER = '(999)999-9999'; 

tells SAS to print phone numbers in the specified format.

Restrictions on the PICTURE statement include:

  • The name of the picture format (e.g., phonepix) must be a valid SAS name.
  • When you use the format later, you must follow the name with a period.
  • Picture format options, such as FILL, MULT, PREFIX, and NOEDIT, should be specified in parentheses after each picture in a picture format.
  • Tthe maximum length for a picture is 40 characters.

As is true for the INVALUE and VALUES statements, the PICTURE statement in the FORMAT procedure merely defines a picture. In order for the picture to take effect, you must associate the variable with the picture with a FORMAT statement in either a DATA step or a PROC step.

The FILL option specifies a fill character, which replaces the leading blanks of the picture until a significant digit is encountered. The default fill character is a blank. The PREFIX option specifies a one- or two-character prefix placed in front of the value's first significant digit. The PREFIX option is often used for leading dollar signs and minus signs. For example, the following picture statement:

PICTURE dolpix OTHER='00,000,000.00' (fill='*' prefix='$'); 

tells SAS to print dollar amounts, such as 19999 as ***$19,999.00.

See SAS Help for information about the MULT and NOEDIT options of the PICTURE statement.

Example 9.12

The FORMAT procedure in the following SAS program defines a picture format for the ICDB variable subj:

 PROC FORMAT;
   picture subjpix LOW-HIGH = '00-0000';
 RUN;

 PROC PRINT data=back;
   title 'BACK dataset with SUBJ pictured as 00-0000';
   format subj subjpix.;
   var subj v_date sex;
 RUN; 
BACK dataset with SUBJ pictured as 00-0000
Obs subj v_date sex
1 11-0051 01/25/94 2
2 11-0088 02/28/95 2
3 21-0012 07/16/93 2
4 22-004 07/27/93 2
5 23-0006 01/06/94 2
6 31-0083 01/20/95 1
7 41-0012 09/16/93 2
8 42-0037 02/02/94 2
9 51-0027 02/15/94 2
10 52-0017 11/17/93 2

The picture allows the first two digits of subj, which happens to define the hospital number, to be separated from the remaining four digits by a dash. The range "LOW-HIGH" tells SAS that all values should be printed in this format. Since the "digit selector" used to define the template is 0, leading zeroes are not printed. (That's a moot point here, since all of the ICDB subject numbers begin with non-zero numbers). In general, non-zero digit selectors, such as say 9, tell SAS to print leading zeroes.

Launch and run the SAS program. Review the contents of the PRINT procedure to convince yourself that, as requested, SAS associated the variable subj with the defined picture format subjpix, and then printed the subj variable accordingly.

Example 9.13

The following SAS program illustrates two more picture formats:

 DATA temp5;
    input subj ssn expens;
    cards;
 110051 001111111  1099.99
 110088 022234567 10876.34
 210012 123345567  9567.21
 220004 120451207  5640.12
 230006 125398710   344.46
 310083 237982019  3235.09
 410012 323432429  1343.03
 420037 340234839 11348.29
 510027 928373402  7362.79
 520017 433492349  3295.09
 ;
 RUN;

 PROC FORMAT;
   picture ssnpix LOW-HIGH = '999-99-9999';

   picture dolpix LOW-HIGH = '000,000.00' (prefix='$' fill='*');
 RUN;

 PROC PRINT data=temp5;
    title 'Output Dataset: TEMP5.  Examples of Picture Formats.';
    format ssn ssnpix. expens dolpix.;
    var subj ssn expens;
 RUN;
Output Dataset: TEMP5. Examples of Picture Formats.
Obs subj ssn expens
1 10051 001-11-1111 *$1,099.99
2 110088 022-23-4597 $10,876.34
3 2100012 123-34-5567 *$9,567.21
4 220004 120-45-1207 *$5,640.12
5 230006 125-39-8710 ***$344.46
6 310083 237-98-2019 *$3,235.09
7 410012 323-43-2429 *$1,343.03
8 420037 340-23-4839 $11,348.29
9 510027 928-37-3402 *$7,362.79
10 520017 433-49-2349 *$3,295.09

The first DATA step merely creates a temporary data set called temp5 by reading in three variables: subj (matching the ID numbers with those in the back data set), ssn (social security number), and expens (hospital expenses, a dollar amount). Then, the FORMAT procedure defines two picture formats, ssnpix and dolpix.

  • The ssnpix format tells SAS to print social security numbers with the stated format. Since a non-zero digit selector ("9") is used, SAS prints leading zeroes. That is, the social security number 032-45-2190 will be printed with the leading zero.
  • The dolpix format tells SAS to print expense amounts dropping any leading zeroes. The PREFIX option tells SAS to place a dollar sign ("$") before the expense amount, while the FILL option tells SAS to fill the remaining positions with asterisks ("*").

The PRINT procedure associates the variable ssn with the format ssnpix and the variable expens with the format dolpix, as well as prints the data set. Launch and run the SAS program and review the output of the PRINT procedure to convince yourself that the picture formats did indeed have the desired effect.

Note that although the largest expense in the data set is 11,348.29, the dolpix picture seemingly allows for numbers in the hundreds of thousands, 100,000.00 say. If we did not provide that extra digit, then SAS would not have room to place a dollar sign before an expense amount of 11,348.29. This is a common mistake, so please keep it in mind! If this is unclear to you, you might want to remove one of the three 0s that appear before the comma in the dolpix picture, re-run the SAS program, and review the PRINT procedure to see that the dollar sign is not displayed for the 11,348.29 or 10,876.34 amounts.


9.7 - The FMTLIB Option

9.7 - The FMTLIB Option

You might have taken note that the FORMAT procedure by itself does not generate any output. Indeed, the FORMAT procedure prints output only when you specify the FMTLIB option in the PROC FORMAT statement. The FMTLIB option of the FORMAT procedure tells SAS to display a list of all the formats and/or informats in your catalog, along with desciptions of their values. The FMTLIB option can be particularly helpful when you are working with a large catalog of formats, and have forgotten the exact spelling of a specific format name or its range of values.

Example 9.14

The following code uses the FORMAT procedure's FMTLIB option to request that SAS display information about three formats appearing in the work.format catalog:

PROC FORMAT FMTLIB;
   title 'Selected Formats from WORK.FORMAT Catalog';
   select racefmt ssnpix dolpix;
RUN;

Launch and run the SAS program and review the output. Since the FORMAT procedure here does not refer to a permanent library, the contents of the temporary work.formats catalog are printed. The SELECT statement tells SAS to print information only on a select few formats rather than on the entire catalog. (See SAS Help for more details on the SELECT statement and its sister EXCLUDE statement.)

Although not used in this example, the PAGE option may be used additionally to tell SAS to print the information about each format and informat in the catalog on a separate page.

PROC FORMAT FMTLIB PAGE;
RUN;

The FORMAT procedure's PAGE option is meaningless unless the FMTLIB option is also invoked.


9.8 - Summary

9.8 - Summary

In this lesson, we learned how the FORMAT procedure allows us to control the appearance of variables in our output.

The homework assignment for this lesson will give you practice with these techniques.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility