Lesson 9: The Format Procedure

Lesson 9: The Format Procedure

Overview

We previously learned how to use a FORMAT statement to tell SAS to display certain variable values in a particular way. For example, we might tell SAS to display a date variable saledate, say, using the SAS mmddyy10. format, so that August 19, 2008,f is displayed as 08/19/2008. There are a whole slew of informats and formats that SAS provides that you can find in the SAS Help and Documentation. Our focus in this lesson will be on creating informats and formats to fill in for those that SAS doesn't provide.

That is, in this lesson, we will extend our formatting capabilities by investigating how to create user-defined informats and formats using the FORMAT procedure. In particular, we will take a look at the following techniques:

  • how to translate values of a character variable when they are read in a SAS data set into more meaningful values using the INVALUE statement
  • how to create customized formats for character and numeric variables, using the VALUE statement, so variables can be printed in a meaningful format
  • how to create templates, using the PICTURE state,ment, for printing numbers with a special format, such as leading zeros, decimal and comma punctuation, fill characters, and prefixes.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • create an ASCII file from a SAS data set using a PUT statement
  • tell SAS how to read in special characters using an INVALUE statement in the FORMAT procedure
  • associate a character variable with a particular informat
  • create a permanent formats catalog for a SAS data set
  • tell SAS how to print certain variables using a VALUE statement in the FORMAT procedure
  • define categories using a format
  • access a data set with permanently formatted variables without having access to the formats catalog by running the the NOFMTERR option
  • include a secondary program file in a main SAS program using the %INCLUDE statement
  • define formats using a codebook to help 
  • create templates for printing numbers by defining a format that allows for special characters using a PICTURE format  
  • instruct SAS to print the contents of a formats catalog using the FMTLIB option of the FORMAT procedure

9.1 - A Working Data Set

9.1 - A Working Data Set

Throughout this lesson, we will investigate a number of examples that illustrate how to create different informats and formats for several different variables. To do so, we will use a subset of the demographic (or "background") data collected on 638 subjects once enrolled in the National Institute of Health's Interstitial Cystitis Data Base (ICDB) Study. Not surprisingly, the ICDB Study collected data on people who were diagnosed as having interstitial cystitis! The primary reason for conducting the study was that interstitial cystitis is a poorly understood condition that causes severe bladder and pelvic pain, urinary frequency, and painful urination in the absence of any identifiable cause. Although the disease is more prevalent in women, it affects both men and women of all ages. For the ICDB Study, each subject was enrolled at one of seven clinical centers and was evaluated four times a year for as many as four years.

It will probably be helpful for you to take a peek at the background data form on which the data were collected. In order to run the SAS programs in this lesson, you'll need to save the background data set to a directory on your computer. To do so, right-click the link and select the "Save Link As" option. A Save dialog box will appear and allow you to save the data file to the location you choose on your computer.

Example 9.1

Because there are 638 observations and 16 variables in the permanent background data set icdb.back, the data on just ten subjects and nine variables are selected when creating the temporary working background data set back. The following SAS program creates the subset:

OPTIONS PS = 58 LS = 80 NODATE NONUMBER;
LIBNAME icdb 'C:\Yourdrivename\icdb\data';
DATA back;
   set icdb.back;
   age = (v_date - b_date)/365.25;
   if subj in (110051, 110088, 210012, 220004, 230006,
               310083, 410012, 420037, 510027, 520017);
   keep subj v_date b_date age sex state country race relig;
   format age 4.1;
RUN;
PROC PRINT;
  title 'Output Dataset: BACK';
RUN;

Output Dataset: BACK

Obs

subj

v_date

b_date

sex

state

country

race

relig

age

1

110051

1/25/94

12/02/42

2

42

1

4

3

51.1

2

110088

2/28/95

10/03/27

2

23

1

4

2

67.4

3

210012

07/16/93

06/27/24

2

.

6

4

1

69.1

4

220004

07/27/93

08/07/72

2

38

1

4

1

21.0

5

230006

01/06/94

04/24/49

2

21

1

4

3

44.7

6

310083

01/20/95

05/13/54

1

.

17

2

1

40.7

7

410012

09/16/93

11/01/47

2

22

1

4

3

45.9

8

420037

02/02/94

0/25/41

2

22

1

4

1

52.5

9

510027

02/15/94

08/16/63

2

49

1

4

1

30.5

10

520017

11/17/93

09/24/54

2

14

1

4

1

39.1

Launch the SAS program. Then, edit the LIBNAME statement so it reflects the location where you saved the background data set. Then, run  the program and review the contents of the print procedure to familiarize yourself with the structure and contents of the subset data set called back.

Note that the IF statement tells SAS which ten subjects we want to be included in the back data set. And, the KEEP statement tells SAS which nine variables we want included in the back data set. We will learn more about the KEEP statement in Stat 481. You might also want to note that the FORMAT statement tells SAS to use the SAS-provided w.d format to display an age as 44.7, say.

Example 9.2

We'll also need to work with a raw data file version of the subset data set. The following SAS code creates the ASCII raw data file, in column format, from the temporary back data set:

DATA _NULL_;
  set back;
  file 'C:\Yourdrivename\icdb\data\back.dat';
  put subj 1-6 @8 b_date mmddyy8. sex 17 race 19 
      relig 21 state 23-24 country 26-27 
      @29 age 4.1 @34 v_date mmddyy8.;
RUN;

The SAS data set name _NULL_ tells SAS to execute the DATA step as if it were creating a new SAS data set, but no observations and no variables are written to an output data set. The PUT statement tells SAS to write the variables — in the format specified — to the filename specified (back.dat) in the FILE statement. The specifications used in the PUT statement are similar to the specifications used in the INPUT statement.

Launch the SAS program. Then, edit the FILE statement so it reflects the location where you would like the raw data file saved. Then, run  the program. Open the newly created back.dat file in an ASCII editor, such as NotePad, to convince yourself that its structure and contents are similar to the back data set.


9.2 - The INVALUE Statement

9.2 - The INVALUE Statement

The INVALUE statement in the FORMAT procedure allows you to create your own customized informats for character variables. That is, it allows you to tell SAS how you'd like the program to read in special character values. In doing so, SAS effectively translates the values of a character variable into different, typically more meaningful character or numeric values. For example, the following INVALUE statement:

INVALUE $french 'OUI'= 'YES' 
                'NON'= 'NO'; 

prepares SAS to translate a character variable in French to a character variable in English.

Restrictions on the INVALUE statement include:

  • You can only translate a character variable to another variable. You cannot translate a numeric variable using the INVALUE statement.
  • The name of the informat must begin with a $ sign, since it refers to a character variable.
  • The name of the informat (for example, french) must be a valid SAS name with no more than 30 additional characters following the imperative $ sign. The name cannot end in a number nor can the name be a standard SAS informat name.
  • When you refer to the informat later, you must follow the name with a period.

The INVALUE statement in the FORMAT procedure merely defines an informat so that it is available for use. In order for the informat to take effect, you must associate the character variable with the informat either explicitly in the INPUT statement:

INPUT resp $french.;

or in a FORMAT statement:

FORMAT resp $french.; 

Let's take a look at an example!

Example 9.3

The following SAS code illustrates the use of the FORMAT procedure to define how SAS should translate the two character variables sex and race during input:

PROC FORMAT;
  invalue $insex '1' = 'M'
                 '2' = 'F';

  invalue $inrace '1' = 'Indian'
                  '2' = 'Asian'
                  '3' = 'Black'
                  '4' = 'White';
RUN;

Because the INVALUE statement is used, the translation is restricted to taking place on input. As a result of this code, providing the character variable sex is later associated with the informat $insex, whenever SAS encounters the character value '1' for the variable sex it will instead store the character value 'M'. Similarly, whenever SAS encounters the character value '2' for the variable sex it will instead store the character value 'F'.

Launch and run  the SAS program. The only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like this:

1    PROC FORMAT;
2      invalue $insex '1' = 'M'
3                     '2' = 'F';
NOTE: Informat $INSEX has been output.
4
5      invalue $inrace '1' = 'Indian'
6                      '2' = 'Asian'
7                      '3' = 'Black'
8                      '4' = 'White';
NOTE: Informat $INRACE has been output.
9    RUN;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.02 seconds
      cpu time            0.00 seconds

As we'll learn later in this lesson, in order to make the definitions for reading in sex and race permanently stored beyond our current work session, we'd need to attach a "LIBRARY =" option to the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this format procedure are temporary only. That is, they are not stored beyond your current SAS session.

All we've done so far is define the informats so that they are available for use. Now let's use them!

Example 9.4

The following data step uses the informats that we defined in the previous example to read in a subset of the data from the input raw data file back.dat:

DATA temp1;
  infile 'C:\Yourdrivename\icdb\data\back.dat';
  length sex $ 1 race $ 6;
  input subj 1-6 @17 sex $insex1. @19 race $inrace1.;
RUN;
PROC CONTENTS data=temp1;
  title 'Output Dataset: TEMP1';
RUN;
PROC PRINT data=temp1;
  var subj sex race;
RUN;

Output Dataset: TEMP1
The CONTENTS Procedure

Data Set name

WORK.TEMP1

Observations

10

Member Type

DATA

Variables

3

Engine

V9

Indexes

0

Created

Wed, Nov 05, 2023 11:06:38 AM

Observation Length

16

Last Modified

Wed, Nov 05, 2023 11:06:38 AM

Deleted Observations

0

Protection

 

Compressed

NO

Data Set Type

 

Sorted

NO

Label

   

Data Representation

WINDOWS_32

  

Encoding

wlatin1 Western (Windows)

  
Engine/Host Dependent Information

Data Set Page Size

4096

Number of Data Set Pages

1

First Data Page

1

Max Obs per Page

252

Obs in First Data Page

10

Number of Data Set Repairs

0

File Name

C:\DOCUME~1\Yourdrivename~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp1.sas7bdat

Release Created

9.010M3

Host Created

XP_PRO

Alphabetic List of Variables and Attributes

#

Variable

Type

Len

2

race

Char

6

1

sex

Char

1

3

subj

Num

8

Output Dataset: TEMP1

Obs

subj

sex

race

1

110051

F

White

2

110088

F

White

3

210012

F

White

4

220004

F

White

5

230006

F

White

6

310083

M

Asian

7

410012

F

White

8

420037

F

White

9

510027

F

White

10

520017

F

White

Only a subset of the variables in the back.dat data file is read. Column numbers ("1-6") are used to read the variable subj, and absolute pointer controls are used to read the variables sex ("@17") and race ("@19") from the file. Note that:

  • Because we want to translate the variables, we must read sex and race as character variables, even though they are numbers.
  • On input, we have the option of specifying the length of the variables being read in. The length of the variables is specified in the informat name between the name and the period. For example, the length of the variable race being read in is defined as 1 in the informat $inrace1.
  • The LENGTH statement defines the length of sex and race after translation.

Launch the SAS program. Then, edit the INFILE statement so that it reflects the location of your stored back.dat file. Then, run  the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, note that the variables sex and race are both character variables, as indicated by "Char" appearing under the Type column in the output from the CONTENTS procedure. Also, note that the contents procedure gives no indication that the variables sex and race are formatted in any particular way for output. We'd have to take care of that by using a VALUE statement (as opposed to an INVALUE statement)!

Finally, as a little sidebar, recall that the TITLE statement is a toggle statement. That is, its value remains in effect until it is changed with another TITLE statement. Therefore, the title in the PRINT procedure is the same that is used in the CONTENTS procedure.


9.3 - The VALUE Statement

9.3 - The VALUE Statement

The INVALUE statement in the FORMAT procedure allows you to create your own customized informats so that variables can be read in meaningful ways, whereas the VALUE statement allows you to create your own customized formats so that variables can be displayed in meaningful ways. Customized formats do not alter variable types; they merely tell SAS to print variables according to your customized definitions. For example, providing the numeric variable sex is associated with the format sexfmt that is defined in the following VALUE statement:

VALUE sexfmt 1 = 'Male'
             2 = 'Female';

SAS will print "Male" when the variable sex = 1 and "Female" when sex = 2. The variable type of sex remains numeric. Restrictions on the VALUE statement include:

  • The name of the format for numeric variables (for example, sexfmt) must be a valid SAS name of up to 32 characters, not ending in a number.
  • The name of the format for a character variable must begin with a $ sign, and have no more than 31 additional characters.
  • When you define the format in the VALUE statement, the format name cannot end in a period.
  • But when you use the format later, you must follow the name with a period. 
  • The maximum length for a format label is 32,767 characters 

Just as is true for the INVALUE statement, the VALUE statement in the FORMAT procedure merely defines a format. In order for the format to take effect, you must associate the variable with the format you've defined by using a FORMAT statement in either a DATA step or a PROC step.

Example 9.5

The following FORMAT procedure defines how SAS should display numeric variables associated with the two formats sexfmt and racefmt during output:

PROC FORMAT;
  value sexfmt 1 = 'Male'
               2 = 'Female';

  value racefmt 1 = 'Indian'
                2 = 'Asian'
                3 = 'Black'
                4 = 'White';
RUN;

The translation is restricted to taking place on output since the VALUE statement is used. As a result of this code, providing the numeric variable sex is later associated with the format sexfmt, whenever SAS goes to print the numeric value 1 for the variable sex, it will instead print the character value 'Male'. Similarly, whenever SAS goes to print the numeric value 2 for the variable sex, it will instead print the character value 'Female'.

Launch and run  the SAS program. Again, the only way you'll know if anything happened is by checking out your log window. You should see a message that looks something like this:

1    PROC FORMAT;
2      value sexfmt 1 = 'Male'
3                   2 = 'Female';
NOTE: Format SEXFMT has been output.
4
5      value racefmt 1 = 'Indian'
6                    2 = 'Asian'
7                    3 = 'Black'
8                    4 = 'White';
NOTE: Format RACEFMT has been output.
9    RUN;

NOTE: PROCEDURE FORMAT used (Total process time):
      real time           0.02 seconds
      cpu time            0.01 seconds

Again, in order to make the definitions for printing sex and race permanently stored beyond your current work session, you'd need to put a "LIBRARY =" option on the PROC FORMAT statement. Since one doesn't exist here, the definitions defined in this FORMAT procedure are temporary only.

All we've done so far is define the formats so that they are available for use. Now let's use them!

Example 9.6

The following SAS code uses the formats to print in a meaningful way the sex and race variables contained in the back data set:

DATA temp2;
   set back;
   f_race=race; 
   f_sex=sex;
   format f_race racefmt. f_sex sexfmt.;
RUN;
PROC PRINT data=temp2;
  title 'Output Dataset: TEMP2';
  var subj sex f_sex race f_race;
RUN;
PROC CONTENTS data=temp2;
RUN;

Output Dataset: TEMP2

Obs

subj

sex

f_sex

race

f_race

1

110051

2

Female

4

White

2

110088

2

Female

4

White

3

210012

2

Female

4

White

4

220004

2

Female

4

White

5

230006

2

Female

4

White

6

310083

1

Male

2

Asian

7

410012

2

Female

4

White

8

420037

2

Female

4

White

9

510027

2

Female

4

White

10

520017

2

Female

4

White

 

Output Dataset: TEMP2
The CONTENTS Procedure

Data Set name

WORK.TEMP2

Observations

10

Member Type

DATA

Variables

10

Engine

V9

Indexes

0

Created

Wed, Nov 05, 2023 11:21:06 AM

Observation Length

80

Last Modified

Wed, Nov 05, 2023 11:21:06 AM

Deleted Observations

0

Protection

 

Compressed

NO

Data Set Type

 

Sorted

NO

Label

   

Data Representation

WINDOWS_32

  

Encoding

wlatin1 Western (Windows)

  

 

Engine/Host Dependent Information

Data Set Page Size

8192

Number of Data Set Pages

1

First Data Page

1

Max Obs per Page

101

Obs in First Data Page

10

Number of Data Set Repairs

0

File Name

C:\DOCUME~1\Yourdrivename~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp2.sas7bdat

Release Created

9.010M3

Host Created

XP_PRO

 

Alphabetic List of Variables and Attributes

#

Variable

Type

Len

Format

3

b_date

Num

8

MMDDYY8.

6

country

Num

8

 

9

f_race

Num

8

RACEFMT.

10

f_sex

Num

8

SEXFMT.

7

race

Num

8

 

8

relig

Num

8

 

4

sex

Num

8

 

5

state

Num

8

 

1

sunj

Num

8

 

2

v_date

Num

8

MMDDTT8.

Well, that's not precisely true! First, in creating the new data set temp2 from the back data set, two additional (numeric) variables are created, f_sex and f_race. They are equated, respectively, to the variables sex and race. Just as with SAS formats, you must associate a user-defined format with a variable in a FORMAT statement. The FORMAT statement:

format f_race racefmt. f_sex sexfmt.;

associates the f_race variable with the racefmt. format and the f_sex variable with the sexfmt. format. Again, just as is true for SAS formats, you can place the FORMAT statement in either a DATA step or a PROC step. If you place the FORMAT in a PROC step, the format is associated with the variable only for the procedure in which the association is made. If you instead place the FORMAT statement in a DATA step, the format becomes available for all subsequent procedures.

Incidentally, note that it is not necessary to create a formatted and unformatted version of the same variables as we did in this example merely for educational purposes. Creating two versions of the same variables merely helps us see the effect the formatting has on the sex and race variables.

Launch and run  the SAS program and review the output from the CONTENTS and PRINT procedures. In particular, observe the difference in the printed output between the formatted and unformatted versions of the variables f_sex and sex (and f_race and race). Also, note that the CONTENTS procedure indicates that the variables sex and race are unformatted, numeric variables (since there is no special format specified), while f_sex and f_race are formatted, numeric variables (a special format is specified).

Example 9.7

The FORMAT procedure is useful in defining meaningful categories once you've converted one or more (perhaps continuous) variables into one categorical variable. The following SAS code illustrates the technique:

 PROC FORMAT;
    value age2fmt 1 = 'LT 20'
                  2 = '20-44'
                  3 = '45-54'
                  4 = 'GE 54'
                  OTHER = 'Missing';
 RUN;
 DATA temp3;
   set back;
        if age = . then age2 = .;
   else if age lt 20 then age2 = 1;
   else if age ge 20 and age lt 45 then age2 = 2;
   else if age ge 45 and age lt 54 then age2 = 3;
   else if age ge 54 then age2 = 4;
   format age2 age2fmt.;
 RUN;
 PROC FREQ data=temp3;
   title 'Age Frequency in TEMP3';
   table age2;
 RUN;

Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.

PROC FORMAT; *Define the format age2fmt;
    value age2fmt 1 = 'LT 20'
                  2 = '20-44'
                  3 = '45-54'
                  4 = 'GE 54'
                  OTHER = 'Missing'; *Group ages into four categories.  OTHER will categorize anything left over as Missing;
 RUN;

 DATA temp3;
   set back; *Read the back dataset into temp3;

	*Create a new variable (age2);
        if age = . then age2 = .; *Code for missing values;
   else if age lt 20 then age2 = 1;
   else if age ge 20 and age lt 45 then age2 = 2;
   else if age ge 45 and age lt 54 then age2 = 3;
   else if age ge 54 then age2 = 4;
   format age2 age2fmt.; *Use the previously defined format when displaying age2;
 RUN;

 PROC FREQ data=temp3; *Create a one-way table of frequencies for age2;
   title 'Age Frequency in TEMP3';
   table age2;
 RUN;

First, inspect the SAS program to make sure you understand the code. Then, launch and run  the program and review the original data set as well as the output from the FREQ procedure to convince yourself that the age categories have been appropriately labeled. Incidentally, we'll learn more about the FREQ procedure soon in another lesson!

Example 9.8

Now, as long as we are interested in grouping values of only one variable, rather than doing it as we did in the previous program, we can actually accomplish it a bit more efficiently directly within the FORMAT procedure. For example, the following SAS code uses the FORMAT procedure to define the format agefmt based on the possible values of the variable age:

 PROC FORMAT;
   value agefmt LOW-<20  = 'LT 20'
                20-<45  = '20-44'
                45-<54  = '45-54'
                54-HIGH = 'GE 54'
                OTHER   = 'Missing';
 RUN;
 PROC FREQ data=back;
   title 'Age Frequency in BACK';
   format age agefmt.;
   table age;
 RUN;

Age Frequency in BACK
The FREQ Procedure

age

Frequency

Percent

Cumulative Frequency

Cumulative Percent

20-44

5

50.00

5

50.00

45-54

3

30.00

8

80.00

GE 54

2

20.00

10

100.00

In defining groups of values right within the FORMAT procedure, note that as illustrated in this program:

  • The potential ranges are defined using a dash (-). You can also list a range of values by separating the values with commas: 1,2,3 = 'Low'
  • The < symbol means "not including." For example, 20-<45 means all ages between 20 and 45, including 20, but not including 45.
  • The special LOW and HIGH ranges allow you to group values without knowing the smallest and largest values, respectively. (The keyword LOW does not include missing numeric values, but if applied to a character format, it does include missing character values.)

The FREQ procedure tallies the number of subjects falling within each of the age groups as defined in the FORMAT procedure. Here, the variable age is associated with the format agefmt using a FORMAT statement right within the FREQ procedure.

Now, launch and run  the program and review the original data set as well as the output from the frequency procedure to convince yourself that the age categories have again been appropriately labeled.


9.4 - Permanent Formats

9.4 - Permanent Formats

All of the customized informat and format definitions in this lesson thus far have been stored only temporarily. That is, the informats and formats are valid only for the duration of the SAS session in which they are defined. If you wanted to use the informats or formats again in a different SAS program, you would have to create them again using another FORMAT procedure. If you plan to use a customized informat or format repeatedly, you can store it permanently in a "formats catalog" by using the LIBRARY= option in the PROC FORMAT statement. Basically, the LIBRARY= option tells SAS where the formats catalog is (to be) stored. You tell SAS the library (which again you can think of as a directory or location) by using a LIBNAME statement:

LIBNAME libref 'c:\directory\where\formats\stored';

where libref is technically a name of your choosing. Note though that when a user-defined informat or format is called by a DATA or PROC step, SAS first looks in a temporary catalog named work.formats. (Recall that "work" is what SAS always treats as your temporary working library that goes away at the end of your SAS session.) If SAS does not find the format or informat in the temporary catalog, it then by default looks in a permanent catalog called library.formats. So, while, yes, libref is technically a name of your choosing, it behooves you to call it library since that is what SAS looks for first. That's why SAS recommends but does not require, that you use the word library as the libref when creating permanent formats.

To make this blather a bit more concrete, suppose we have the following LIBNAME statement in our SAS program:

LIBNAME library 'C:\YourDriveName\Stat480WCDEV\format\sasndata\';

and have a format procedure that starts with:

PROC FORMAT library=library;

Then, upon running the program, SAS creates a permanent catalog containing all of the formats and informats that are defined in the FORMAT procedure and stores it in the folder referenced above, as illustrated here:

 

A formats catalog, regardless of whether it is temporary (work.formats) or permanent (library.formats), contains one entry for each format or informat defined in a FORMAT procedure. Because library.formats is the reserved name for permanent formats catalogs, you can create only one catalog called formats per SAS library (directory). There are ways around this restriction, but let's not get into that now. Let's jump to an example instead.

Example 9.9

The following SAS program illustrates a FORMAT procedure that creates a permanent formats catalog in the directory referenced by library, that is, in C:\yourdrivename\icdb\data:

LIBNAME library 'C:\yourdrivename\icdb\data';
PROC FORMAT library=library;
   value sex2fmt 1 = 'Male'
                 2 = 'Female';
   value race2fmt 3 = 'Black'
                  4 = 'White'
                  OTHER = 'Other';
 RUN;
 DATA temp4; 
   infile 'c:\yourdrivename\icdb\data\back.dat';
   input subj 1-6 sex 17 race 19;
   format sex sex2fmt. race race2fmt.;
 RUN;
 PROC CONTENTS data=temp4;
   title 'Output Dataset: TEMP4';
 RUN;
 PROC PRINT data=temp4;
 RUN;

Output Dataset: TEMP4
The CONTENTS Procedure

Data Set name

WORK.TEMP4

Observations

10

Member Type

DATA

Variables

3

Engine

V9

Indexes

0

Created

Wed, Nov 05, 2023 12:05:29 PM

Observation Length

24

Last Modified

Wed, Nov 05, 2023 12:05:29 PM

Deleted Observations

0

Protection

 

Compressed

NO

Data Set Type

 

Sorted

NO

Label

   

Data Representation

WINDOWS_32

  

Encoding

wlatin1 Western (Windows)

  
Engine/Host Dependent Information

Data Set Page Size

4096

Number of Data Set Pages

1

First Data Page

1

Max Obs per Page

168

Obs in First Data Page

10

Number of Data Set Repairs

0

File Name

C:\DOCUME~1\Yourdrivename~1\LOCALS~1\TEMP\SAS TEMPORARY FILES\_TD3812\temp4.sas7bdat

Release Created

9.010M3

Host Created

XP_PRO

Alphabetic List of Variables and Attributes

#

Variable

Type

Len

Format

3

race

Num

8

RACE2FMT.

2

sex

Num

8

SEX2FMT.

1

v_date

Num

8

 
Output Dataset: TEMP4

Obs

subj

sex

race

1

110051

Female

White

2

110088

Female

White

3

210012

Female

White

4

220004

Female

White

5

230006

Female

White

6

310083

Male

Other

7

410012

Female

White

8

420037

Female

White

9

510027

Female

White

10

520017

Female

White

The DATA step creates a temporary data set called temp4 by reading in the variables subj, sex, and race from the raw data file back.dat, and associates the variables sex and race, respectively, with the formats sex2fmt and race2fmt that are defined in the FORMAT procedure. SAS first looks for the occurrence of these two formats in the temporary catalog work.formats and then when it doesn't find them there, it looks for them in the catalog of the permanent format in the c:\yourdrivename\icdb\data directory.

Launch the SAS program, and edit the INFILE statement so it reflects the location of your back.dat file. And, edit the LIBNAME statement so it reflects your desired location for the catalog of the permanent format. Then, run  the program and review the output from the CONTENTS and PRINT procedures to convince yourself that the variables sex and race are associated with the permanent formats sex2fmt and race2fmt, not the temporary formats sexfmt and racefmt previously associated with f_sex and f_race. Also, view the directory referenced in your LIBNAME statement to convince yourself that SAS created and stored a permanent formats catalog there.

Just a few more comments on this permanent formats stuff. One of the problems with permanent informats and formats is that once a variable has been associated permanently with an informat or format, SAS must be able to refer to the library to access the formats catalog. As long as the formats catalog exists, and you have permission to the file, you just have to specify the appropriate LIBNAME statement:

LIBNAME library 'c:\stat480\formats\'; 

to access the catalog. If for some reason, you do not have access to the formats catalog, SAS will give you an error that looks something like this:

ERROR: Format SEX2FMT not found or couldn't be loaded for variable sex.
ERROR: Format RACE2FMT not found or couldn't be loaded for variable race.
NOTE: The SAS SYstem stopped processing this step because of errors.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds

If you specify the NOFMTERR in the OPTIONS statement:

OPTIONS NOFMTERR; 

you can use the SAS data sets without getting errors. SAS will just display a note (not a program-halting error!) in the log file:

8870     format sex sex2fmt. race race2fmt. ;
                    -------
                    484
NOTE: 484-185: Format SEX2FMT was not found or could not be loaded.
8870!    format sex sex2fmt. race race2fmt. ;
                                  --------
                                  484
NOTE: 484-185: Format RACE2FMT was not found or could not be loaded.                                                      

You will be able to run SAS programs that use the data sets containing the permanent formats. You will just not have access to the formats.

Example 9.10

Rather than creating a permanent formats catalog, you can create a SAS program file that contains only a FORMAT procedure with the desired value and invalue statements. Then you need merely include this secondary program file in your main SAS program using the %INCLUDE statement, as illustrated here:

%INCLUDE 'C:\yourdrivename\icdb\formats\backfmt.sas';
 PROC FREQ data=back;
   title 'Frequency Count of STATE (statefmt)';
   format state statefmt.;
   table state/missing;
 RUN;

Frequency Count of STATE (statefmt)
The FREQ Procedure

state

Frequency

Percent

Cumulative Frequency

Cumulative Percent

Missing

2

20.00

2

20.00

Ind

1

10.00

3

30.00

Mass

1

10.00

4

40.00

Mich

2

20.00

6

60.00

Minn

1

10.00

7

70.00

Other

1

10.00

8

80.00

Tenn

1

10.00

9

90.00

Wisc

1

10.00

10

100.00

To clarify, here's the only thing contained in the backfmt.sas file:

PROC FORMAT;
    value statefmt 14 = 'Ind'
                   21 = 'Mass'
                   22 = 'Mich'
                   23 = 'Minn'
                   42 = 'Tenn'
                   49 = 'Wisc'
                   .  = 'Missing'
                Other = 'Other';

Since the FORMAT procedure in the backfmt.sas file does not refer to a permanent library, the format statefmt is stored in the temporary work.formats catalog.

To run this program, first download and save the backfmt.sas file to a convenient location on your computer. Then, launch the SAS program and edit the %INCLUDE statement so it reflects the location of your backfmt.sas file. Finally, run  the program and review the output from the FREQ procedure. Convince yourself that the format statement in the FREQ procedure appropriately associates the state variable with the statefmt format created by the FORMAT procedure in backfmt.sas. You may also note the MISSING option's effect in the FREQ procedure. Basically, it tells SAS to include missing values as a countable category.

The technique illustrated in this example is particularly useful when you work in an open environment, in which data sets are shared. Different users may not have access to the format file, or different users may prefer different formats.


9.5 - Using Codebooks to Help Define Formats

9.5 - Using Codebooks to Help Define Formats

It is very common for discrete (categorical) variables to have many, many (hundreds, perhaps even thousands of) possible values. Examples include:

  • diseases may be coded by an integer
  • surgical treatments may be coded by an integer
  • medications may be coded by an integer

An electronic "codebook" is typically used to keep track of the meaning of each of the integer codes. That is, codebooks contain two variables, the code and a text description of the code. For example, disease 1124 could be defined in the text as "Rheumatoid Arthritis."

One would find it extremely tedious and time-consuming to have to type a FORMAT procedure that redefines the codes and text contained in these codebooks. Instead, one can take advantage of the fact that the codebook is already in an electronic format, such as a database table, an ASCII file, or even a SAS data set.

When the codebook is contained in a SAS data set with three required variables:

  • start: the variable that contains the starting range values (that is, the codes)
  • label: the variable that contains the text definition
  • fmtname: the format name

SAS can create the appropriate format using the CNTLIN = option in the PROC FORMAT statement. Let's take a look at an example.

Example 9.11

The following SAS program creates a SAS data set called states from state_cd, which is the codebook for the variable state that is collected on the ICDB background form. Here's what the first ten observations of the state_cd data set look like:

Codebook for States

Obs

name

CODE

1

alabama

1

2

alaska

2

3

arizona

3

4

arkansas

4

5

california

5

6

colorado

6

7

connecticuit

7

8

delaware

8

9

florida

9

10

georgia

10

The data set states is then used in the FORMAT procedure to define the format for the variable state in the back data set:

PROC PRINT data=icdb.state_cd;
   title 'Codebook for States';
 RUN;
 DATA states;
   set icdb.state_cd (rename = (code = start name=label));
   fmtname = 'stat2fmt';
 RUN;
 PROC FORMAT cntlin=states;
 RUN;
 PROC FREQ data=back;
   title 'Freqency Count of STATE (stat2fmt)';
   format state stat2fmt.;
   table state;
 RUN;

Frequency count of STATE (stat2fmt)
The FREQ Procedure

state

Frequency

Percent

Cumulative Frequency

Cumulative percent

indiana

1

12.50

1

12.50

massachusetts

1

12.50

2

25.00

michigan

2

25.00

4

50.00

minnesota

1

12.50

5

62.50

pennsylvania

1

12.50

6

75.00

tennessee

1

12.50

7

87.50

wisconsin

1

12.50

8

100.00

Frequency Missing = 2

Before running this program, you'll have to download the codebook state_cd. Save it to the location on your computer that you referenced in the earlier LIBNAME statement by the libref icdb. Then, go ahead and launch and run  the program.

As you can see from the output, the PRINT procedure merely prints the icdb.state_cd codebook for your review. You should notice that the variable names in state_cd do not meet SAS requirements for codebooks. Therefore, the DATA step that creates the data set states merely renames the code and name variables in icdb.state_cd so that they meet SAS requirements. The RENAME= option on the SET statement is what is used to change the code variable to start and the name variable to label. The general syntax of the RENAME= option on the SET statement is:

set dsname (rename = (oldvr1 = newvr1 oldvr2 = newvr2 ...));

An assignment statement is then used to assign the value stat2fmt to the variable fmtname for each observation (that is, code) that appears in the icdb.state_cd data set.

Then, the FORMAT procedure with the CNTLIN = states option tells SAS to create the format stat2fmt based on the contents of the data set states. Finally, the FREQ procedure illustrates the use of the stat2fmt after it was created in this manner. SAS merely counts and reports the number of subjects coming from each of the states. Note that since we didn't include the MISSING option on the TABLE statement, SAS reports the number of missing values after the table, rather than as a row of the table.


9.6 - The PICTURE Statement

9.6 - The PICTURE Statement

I've always thought the FORMAT procedure's PICTURE statement is one of the procedure's coolest features. In short, the PICTURE statement allows you to create templates for printing numbers by defining a format that allows for special characters, such as leading zeros, decimal and comma punctuation, fill characters, prefixes, and negative number representation. Only numeric variables can have picture formats. For a quick example, the following PICTURE statement:

PICTURE phonepix OTHER = '(999)999-9999'; 

tells SAS to print phone numbers in the specified format.

Restrictions on the PICTURE statement include:

  • The name of the picture format (e.g., phonepix) must be a valid SAS name.
  • When you use the format later, you must follow the name with a period.
  • Picture format options, such as FILL, MULT, PREFIX, and NOEDIT, should be specified in parentheses after each picture in a picture format.
  • The maximum length for a picture is 40 characters.

As is true for the INVALUE and VALUES statements, the PICTURE statement in the FORMAT procedure merely defines a picture. In order for the picture to take effect, you must associate the variable with the picture with a FORMAT statement in either a DATA step or a PROC step.

The FILL option specifies a fill character, which replaces the leading blanks of the picture until a significant digit is encountered. The default fill character is a blank. The PREFIX option specifies a one- or two-character prefix placed in front of the value's first significant digit. The PREFIX option is often used for leading dollar signs and minus signs. For example, the following picture statement:

PICTURE dolpix OTHER='00,000,000.00' (fill='*' prefix='$'); 

tells SAS to print dollar amounts, such as 19999 as ***$19,999.00.

See SAS Help for information about the MULT and NOEDIT options of the PICTURE statement.

Example 9.12

The FORMAT procedure in the following SAS program defines a picture format for the ICDB variable subj:

 PROC FORMAT;
   picture subjpix LOW-HIGH = '00-0000';
 RUN;
 PROC PRINT data=back;
   title 'BACK dataset with SUBJ pictured as 00-0000';
   format subj subjpix.;
   var subj v_date sex;
 RUN; 

BACK dataset with SUBJ pictured as 00-0000

Obs

subj

v_date

sex

1

11-0051

01/25/94

2

2

11-0088

02/28/95

2

3

21-0012

07/16/93

2

4

22-004

07/27/93

2

5

23-0006

01/06/94

2

6

31-0083

01/20/95

1

7

41-0012

09/16/93

2

8

42-0037

02/02/94

2

9

51-0027

02/15/94

2

10

52-0017

11/17/93

2

The picture allows the first two digits of subj, which happens to define the hospital number, to be separated from the remaining four digits by a dash. The range "LOW-HIGH" tells SAS that all values should be printed in this format. Since the "digit selector" used to define the template is 0, leading zeroes are not printed. (That's a moot point here since all of the ICDB subject numbers begin with non-zero numbers). In general, non-zero digit selectors, such as say 9, tell SAS to print leading zeroes.

Launch and run  the SAS program. Review the contents of the PRINT procedure to convince yourself that, as requested, SAS associated the variable subj with the defined picture format subjpix, and then printed the subj variable accordingly.

Example 9.13

The following SAS program illustrates two more picture formats:

 DATA temp5;
    input subj ssn expens;
    cards;
 110051 001111111  1099.99
 110088 022234567 10876.34
 210012 123345567  9567.21
 220004 120451207  5640.12
 230006 125398710   344.46
 310083 237982019  3235.09
 410012 323432429  1343.03
 420037 340234839 11348.29
 510027 928373402  7362.79
 520017 433492349  3295.09
 ;
 RUN;
 PROC FORMAT;
   picture ssnpix LOW-HIGH = '999-99-9999';
   picture dolpix LOW-HIGH = '000,000.00' (prefix='$' fill='*');
 RUN;
 PROC PRINT data=temp5;
    title 'Output Dataset: TEMP5.  Examples of Picture Formats.';
    format ssn ssnpix. expens dolpix.;
    var subj ssn expens;
 RUN;

Output Dataset: TEMP5. Examples of Picture Formats

Obs

subj

ssn

expens

1

10051

001-11-1111

*$1,099.99

2

110088

022-23-4597

$10,876.34

3

2100012

123-34-5567

*$9,567.21

4

220004

120-45-1207

*$5,640.12

5

230006

125-39-8710

***$344.46

6

310083

237-98-2019

*$3,235.09

7

410012

323-43-2429

*$1,343.03

8

420037

340-23-4839

$11,348.29

9

510027

928-37-3402

*$7,362.79

10

520017

433-49-2349

*$3,295.09

The first DATA step merely creates a temporary data set called temp5 by reading in three variables: subj (matching the ID numbers with those in the back data set), ssn (social security number), and expens (hospital expenses, a dollar amount). Then, the FORMAT procedure defines two picture formats, ssnpix and dolpix.

  • The ssnpix format tells SAS to print social security numbers with the stated format. Since a non-zero digit selector ("9") is used, SAS prints leading zeroes. That is, the social security number 032-45-2190 will be printed with the leading zero.
  • The dolpix format tells SAS to print expense amounts dropping any leading zeroes. The PREFIX option tells SAS to place a dollar sign ("$") before the expense amount, while the FILL option tells SAS to fill the remaining positions with asterisks ("*").

The PRINT procedure associates the variable ssn with the format ssnpix and the variable expens with the format dolpix, as well as prints the data set. Launch and run  the SAS program and review the output of the PRINT procedure to convince yourself that the picture formats did indeed have the desired effect.

Note that although the largest expense in the data set is 11,348.29, the dolpix picture seemingly allows for numbers in the hundreds of thousands, 100,000.00 say. If we did not provide that extra digit, then SAS would not have room to place a dollar sign before an expense amount of 11,348.29. This is a common mistake, so please keep it in mind! If this is unclear to you, you might want to remove one of the three 0s that appear before the comma in the dolpix picture, re-run the SAS program, and review the PRINT procedure to see that the dollar sign is not displayed for the 11,348.29 or 10,876.34 amounts.


9.7 - The FMTLIB Option

9.7 - The FMTLIB Option

You might have taken note that the FORMAT procedure by itself does not generate any output. Indeed, the FORMAT procedure prints output only when you specify the FMTLIB option in the PROC FORMAT statement. The FMTLIB option of the FORMAT procedure tells SAS to display a list of all the formats and/or informats in your catalog, along with descriptions of their values. The FMTLIB option can be particularly helpful when you are working with a large catalog of formats, and have forgotten the exact spelling of a specific format name or its range of values.

Example 9.14

The following code uses the FORMAT procedure's FMTLIB option to request that SAS display information about three formats appearing in the work.format catalog:

PROC FORMAT FMTLIB;
   title 'Selected Formats from WORK.FORMAT Catalog';
   select racefmt ssnpix dolpix;
RUN;

Selected Formats from WORK.FORMAT Catalog

  
----------------------------------------------------------------------------  
|       FORMAT NAME: DOLPIX   LENGTH:   10   NUMBER OF VALUES:    1        |  
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH:  10  FUZZ: STD       |  
|--------------------------------------------------------------------------|  
|START           |END             |LABEL  (VER. V7|V8   12OCT2023:16:53:55)|  
|----------------+----------------+----------------------------------------|  
|LOW             |HIGH            |000,000.00          P$  F* M100         |  
----------------------------------------------------------------------------  
                                                                                                                        
                                                                                                                        
----------------------------------------------------------------------------  
|       FORMAT NAME: RACEFMT  LENGTH:    6   NUMBER OF VALUES:    4        |  
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH:   6  FUZZ: STD       |  
|--------------------------------------------------------------------------|  
|START           |END             |LABEL  (VER. V7|V8   12OCT2023:16:45:18)|  
|----------------+----------------+----------------------------------------|  
|               1|               1|Indian                                  |  
|               2|               2|Asian                                   |  
|               3|               3|Black                                   |  
|               4|               4|White                                   |  
----------------------------------------------------------------------------  
                                                                                                                        
                                                                                                                        
----------------------------------------------------------------------------  
|       FORMAT NAME: SSNPIX   LENGTH:   11   NUMBER OF VALUES:    1        |  
|   MIN LENGTH:   1  MAX LENGTH:  40  DEFAULT LENGTH:  11  FUZZ: STD       |  
|--------------------------------------------------------------------------|  
|START           |END             |LABEL  (VER. V7|V8   12OCT2023:16:53:55)|  
|----------------+----------------+----------------------------------------|  
|LOW             |HIGH            |999-99-9999         P   F  M1           |  
----------------------------------------------------------------------------  

Launch and run  the SAS program and review the output. Since the FORMAT procedure here does not refer to a permanent library, the contents of the temporary work.formats catalog are printed. The SELECT statement tells SAS to print information only on a select few formats rather than on the entire catalog. (See SAS Help for more details on the SELECT statement and its sister EXCLUDE statement.)

Although not used in this example, the PAGE option may be used additionally to tell SAS to print the information about each format and informat in the catalog on a separate page.

PROC FORMAT FMTLIB PAGE;
RUN;

The FORMAT procedure's PAGE option is meaningless unless the FMTLIB option is also invoked.


9.8 - Summary

9.8 - Summary

In this lesson, we learned how the FORMAT procedure allows us to control the appearance of variables in our output.

The homework assignment for this lesson will give you practice with these techniques.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility