9.1 - A Working Data Set9.1 - A Working Data Set
Throughout this lesson, we will investigate a number of examples that illustrate how to create different informats and formats for several different variables. To do so, we will use a subset of the demographic (or "background") data collected on 638 subjects once enrolled in the National Institute of Health's Interstitial Cystitis Data Base (ICDB) Study. Not surprisingly, the ICDB Study collected data on people who were diagnosed as having interstitial cystitis! The primary reason for conducting the study was that interstitial cystitis is a poorly understood condition that causes severe bladder and pelvic pain, urinary frequency, and painful urination in the absence of any identifiable cause. Although the disease is more prevalent in women, it affects both men and women of all ages. For the ICDB Study, each subject was enrolled at one of seven clinical centers and was evaluated four times a year for as many as four years.
It will probably be helpful for you to take a peek at the background data form on which the data were collected. In order to run the SAS programs in this lesson, you'll need to save the background data set to a directory on your computer. To do so, right-click the link and select the "Save Link As" option. A Save dialog box will appear and allow you to save the data file to the location you choose on your computer.
Because there are 638 observations and 16 variables in the permanent background data set icdb.back, the data on just ten subjects and nine variables are selected when creating the temporary working background data set back. The following SAS program creates the subset:
OPTIONS PS = 58 LS = 80 NODATE NONUMBER; LIBNAME icdb 'C:\Simon\icdb\data'; DATA back; set icdb.back; age = (v_date - b_date)/365.25; if subj in (110051, 110088, 210012, 220004, 230006, 310083, 410012, 420037, 510027, 520017); keep subj v_date b_date age sex state country race relig; format age 4.1; RUN; PROC PRINT; title 'Output Dataset: BACK'; RUN;
Launch the SAS program. Then, edit the LIBNAME statement so it reflects the location where you saved the background data set. Then, run the program and review the contents of the print procedure to familiarize yourself with the structure and contents of the subset data set called back.
Note that the IF statement tells SAS which ten subjects we want included in the back data set. And, the KEEP statement tells SAS which nine variables we want included in the back data set. We will learn more about the KEEP statement in Stat 481. You might also want to note that the FORMAT statement tells SAS to use the SAS-provided w.d format to display an age as 44.7, say.
We'll also need to work with an raw data file version of the subset data set. The following SAS code creates the ascii raw data file, in column format, from the temporary back data set:
DATA _NULL_; set back; file 'C:\simon\icdb\data\back.dat'; put subj 1-6 @8 b_date mmddyy8. sex 17 race 19 relig 21 state 23-24 country 26-27 @29 age 4.1 @34 v_date mmddyy8.; RUN;
The SAS data set name _NULL_ tells SAS to execute the DATA step as if it were creating a new SAS data set, but no observations and no variables are written to an output data set. The PUT statement tells SAS to write the variables — in the format specified — to the filename specified (back.dat) in the FILE statement. The specifications used in the PUT statement are similar to the specifications used in the INPUT statement.
Launch the SAS program. Then, edit the FILE statement so it reflects the location where you would like the raw data file saved. Then, run the program. Open the newly created back.dat file in an ascii editor, such as NotePad, to convince yourself that its structure and contents are similar to the back data set.