Working with subsets created from an existing SAS data set can make more efficient use of computer resources than working with the original, larger data set. Reading fewer observations means that fewer iterations of the DATA step must occur.
- The SET statement's FIRSTOBS= option tells SAS to begin reading the data from the input SAS data set at the line number specified by FIRSTOBS.
- The SET statement's OBS= option tells SAS to stop reading the data from the input SAS data set at the line number specified by OBS.
We'll start by using the OBS= option to create the data set that we'll be working with throughout this lesson. The data set we'll use contains demographic (or "background") information collected on 638 subjects once enrolled in the National Institute of Health's Interstitial Cystitis Data Base (ICDB) Study. Not surprisingly, the ICDB Study collected data on people who were diagnosed as having interstitial cystitis! The primary reason for conducting the study was that interstitial cystitis is a poorly understood condition that causes severe bladder and pelvic pain, urinary frequency, and painful urination in the absence of any identifiable cause. Although the disease is more prevalent in women, it affects both men and women of all ages. (If you want to learn more about the ICDB Study, I refer you to one of the National Institute of Health's websites in which a general description is given and the database documentation.
Given that we'll use the ICDB Study's background data, it would probably be helpful for you to take a peek at the background data form on which the data were collected. In order to run the SAS programs in this lesson, you'll need to download and save the background data set (right-click to save!) in a folder on your computer.
Example 14.1 Section
The DATA step in the following program uses the OBS= option to tell SAS to create a temporary data set called back by selecting the first 25 observations from the permanent background data set icdb.back:
OPTIONS PS=58 LS=80 NODATE NONUMBER;
DATA back;
set icdb.back (obs=25);
RUN;
PROC PRINT data=back;
title 'A Subset of the Background Data Set';
RUN;
Obs | subj | v_type | v_date | r_id | b_date | sex | state | country | race | ethnic | relig | mar_st | ed_level | emp_st | job_chng | income |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 110027 | 0 | 10/05/93 | 2068 | 07/05/62 | 2 | 5 | 1 | 4 | 0 | 0 | 1 | 3 | 1 | . | 2 |
2 | 110029 | 0 | 10/05/93 | 2068 | 09/07/26 | 2 | 5 | 1 | 4 | 0 | 2 | 1 | 5 | 8 | . | 2 |
3 | 110039 | 0 | 12/07/93 | 2068 | 07/24/24 | 2 | 22 | 1 | 4 | 0 | 3 | 1 | 3 | 8 | . | 2 |
4 | 110040 | 0 | 11/30/93 | 2068 | 10/20/67 | 2 | 32 | 1 | 4 | 0 | 7 | 1 | 5 | 1 | . | 2 |
5 | 110045 | 0 | 01/11/94 | 2068 | 04/18/25 | 1 | 36 | 1 | 4 | 0 | 3 | 1 | 1 | 8 | 0 | 2 |
6 | 110049 | 0 | 01/25/94 | 2068 | 10/05/23 | 2 | 37 | 1 | 4 | 0 | 1 | 1 | 5 | 8 | 0 | 2 |
7 | 110051 | 0 | 01/25/94 | 2068 | 12/02/42 | 2 | 42 | 1 | 4 | 0 | 3 | 1 | 3 | 1 | 0 | 2 |
8 | 110052 | 0 | 01/27/94 | 1808 | 01/04/25 | 2 | 5 | 1 | 4 | 0 | 0 | 1 | 4 | 8 | 0 | 2 |
9 | 110053 | 0 | 02/22/94 | 1808 | 03/15/22 | 2 | 5 | 1 | 4 | 1 | 3 | 1 | 1 | 8 | 0 | 1 |
10 | 110055 | 0 | 03/15/94 | 1808 | 03/31/41 | 2 | 5 | 1 | 4 | 0 | 0 | 1 | 3 | 1 | 0 | 2 |
11 | 110057 | 0 | 03/15/94 | 2068 | 07/10/44 | 2 | 5 | 1 | 4 | 0 | 3 | 1 | 4 | 2 | 0 | 2 |
12 | 110058 | 0 | 03/18/94 | 1808 | 09/09/50 | 2 | . | 13 | 4 | 1 | 0 | 1 | 3 | 1 | 0 | 1 |
13 | 110059 | 0 | 03/18/94 | 1808 | 07/25/34 | 2 | 13 | 1 | 4 | 0 | 1 | 1 | 3 | 8 | 0 | 1 |
14 | 110060 | 0 | 06/14/94 | 1808 | 05/29/36 | 2 | 13 | 1 | 4 | 0 | 3 | 1 | 3 | 1 | 0 | 2 |
15 | 110062 | 0 | 03/31/94 | 1808 | 04/21/36 | 2 | 3 | 1 | 4 | 0 | 1 | 1 | 4 | 5 | 0 | 2 |
16 | 110065 | 0 | 04/04/94 | 1808 | 10/12/52 | 2 | 5 | 1 | 4 | 0 | 3 | 1 | 4 | 4 | 1 | 1 |
17 | 110066 | 0 | 04/12/94 | 1808 | 08/28/62 | 2 | 5 | 1 | 4 | 0 | 0 | 1 | 4 | 6 | 0 | 2 |
18 | 110067 | 0 | 04/26/94 | 1808 | 02/22/72 | 2 | 5 | 1 | 4 | 0 | 1 | 6 | 4 | 2 | 0 | 2 |
19 | 110068 | 0 | 06/13/94 | 1808 | 09/10/55 | 2 | 25 | 1 | 4 | 0 | 11 | 1 | 3 | 7 | 0 | 2 |
20 | 110069 | 0 | 05/31/94 | 1808 | 08/17/38 | 2 | 32 | 1 | 4 | 0 | 0 | 1 | 3 | 1 | 0 | 2 |
21 | 110070 | 0 | 05/24/94 | 1808 | 12/12/41 | 2 | 30 | 1 | 4 | 0 | 1 | 1 | 3 | 2 | 0 | 2 |
22 | 110074 | 0 | 06/16/94 | 1808 | 10/16/63 | 2 | 47 | 1 | 4 | 0 | 1 | 6 | 4 | 6 | 1 | 1 |
23 | 110075 | 0 | 06/14/94 | 1808 | 11/04/60 | 2 | 47 | 1 | 4 | 0 | 1 | 6 | 4 | 1 | 0 | 1 |
24 | 110076 | 0 | 07/22/94 | 1808 | 01/09/64 | 2 | 5 | 1 | 4 | 0 | 3 | 1 | 4 | 9 | 0 | 2 |
25 | 110077 | 0 | 07/26/94 | 2068 | 07/24/40 | 2 | 5 | 1 | 4 | 0 | 3 | 2 | 4 | 2 | 0 | 2 |
The program is pretty straightforward. The main thing to keep in mind is that you have to enclose the OBS= option in parentheses.
If you haven't already done so, download and save the background data set (click to save!) to a convenient location on your computer. Then, launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you've saved the data set. Then, run the SAS program, and review the output from the PRINT procedure to familiarize yourself with the data set.
Example 14.2 Section
The following program uses the SET statement's FIRSTOBS= and OBS= options to tell SAS to include fourteen observations — observations 7, 8, 9, ..., and 20 — from the permanent icdb.back data set in the temporary back data set:
LIBNAME icdb 'C:\yourdrivename\Stat481WC\02datastep\sasndata';
DATA back1;
set icdb.back (FIRSTOBS=7 OBS=20);
RUN;
PROC PRINT data=back1;
title 'Output Dataset: BACK1';
RUN;
Obs | subj | v_type | v_date | r_id | b_date | sex | state | country | race | ethnic | relig | mar_st | ed_level | emp_st | job_chng | income |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | 110051 | 0 | 01/25/94 | 2068 | 12/02/42 | 2 | 42 | 1 | 4 | 0 | 3 | 1 | 3 | 1 | 0 | 2 |
2 | 110052 | 0 | 01/27/94 | 1808 | 01/04/25 | 2 | 5 | 1 | 4 | 0 | 0 | 1 | 4 | 8 | 0 | 2 |
3 | 110053 | 0 | 02/22/94 | 1808 | 03/15/22 | 2 | 5 | 1 | 4 | 1 | 3 | 1 | 1 | 8 | 0 | 1 |
4 | 110055 | 0 | 03/15/94 | 1808 | 03/31/41 | 2 | 5 | 1 | 4 | 0 | 0 | 1 | 3 | 1 | 0 | 2 |
5 | 110057 | 0 | 03/15/94 | 2068 | 07/10/44 | 2 | 5 | 1 | 4 | 0 | 3 | 1 | 4 | 2 | 0 | 2 |
6 | 110058 | 0 | 03/18/94 | 1808 | 09/09/50 | 2 | . | 13 | 4 | 1 | 0 | 1 | 3 | 1 | 0 | 1 |
7 | 110059 | 0 | 03/18/94 | 1808 | 07/25/34 | 2 | 13 | 1 | 4 | 0 | 1 | 1 | 3 | 8 | 0 | 1 |
8 | 110060 | 0 | 06/14/94 | 1808 | 05/29/36 | 2 | 13 | 1 | 4 | 0 | 3 | 1 | 3 | 1 | 0 | 2 |
9 | 110062 | 0 | 03/31/94 | 1808 | 04/21/36 | 2 | 3 | 1 | 4 | 0 | 1 | 1 | 4 | 5 | 0 | 2 |
10 | 110065 | 0 | 04/04/94 | 1808 | 10/12/52 | 2 | 5 | 1 | 4 | 0 | 3 | 1 | 4 | 4 | 1 | 1 |
11 | 110066 | 0 | 04/12/94 | 1808 | 08/28/62 | 2 | 5 | 1 | 4 | 0 | 0 | 1 | 4 | 6 | 0 | 2 |
12 | 110067 | 0 | 04/26/94 | 1808 | 02/22/72 | 2 | 5 | 1 | 4 | 0 | 1 | 6 | 4 | 2 | 0 | 2 |
13 | 110068 | 0 | 06/13/94 | 1808 | 09/10/55 | 2 | 25 | 1 | 4 | 0 | 11 | 1 | 3 | 7 | 0 | 2 |
14 | 110069 | 0 | 05/31/94 | 1808 | 08/17/38 | 2 | 32 | 1 | 4 | 0 | 0 | 1 | 3 | 1 | 0 | 2 |
Launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the background data set. Then, run the SAS program, and review the output from the PRINT procedure. Compare the output to the output of that from the previous example to convince yourself that the temporary data set back1 indeed contains fourteen observations — observations 7, 8, ..., 20 in the original background data set.