Because a DO loop executes statements iteratively, it provides an easy way to select a sample of observations from a large data set. Let's take a look at an example!
Example 18.12 Section
The following program uses an iterative DO loop and the SET statement's POINT= option to select every 100th observation from the permanent data set called stat481.log11 which contains 8,624 observations:
OPTIONS LS = 72 PS = 34 NODATE NONUMBER;
LIBNAME stat481 'C:\yourdrivename\Stat481WC\06doloops\sasndata';
DATA sample;
DO i = 100 to 8600 by 100;
set stat481.log11 point = i;
output;
END;
stop;
RUN;
PROC PRINT data = sample NOOBS;
title 'Subset of Logged Observations for Hospital 11';
RUN;
SUBJ | V_TYPE | V_DATE | FORM_CD |
---|---|---|---|
110004 | 0 | 04/22/93 | prior |
110027 | 3 | 01/25/94 | med |
110027 | 36 | 08/27/96 | cmed |
110029 | 12 | 09/27/94 | purg |
110029 | 42 | 04/01/97 | sympts |
110039 | 18 | 06/06/95 | void |
110040 | 1 | 01/24/94 | void |
110040 | 39 | 02/18/97 | cmed |
110045 | 15 | 05/09/95 | symph |
110049 | 0 | 01/25/94 | sympts |
110049 | 30 | 07/23/96 | phytrt |
110051 | 12 | 12/13/94 | void |
110052 | 3 | 05/10/94 | void |
110052 | 55 | 05/06/97 | close |
110053 | 24 | 02/06/96 | cmed |
110055 | 6 | 08/30/94 | sympts |
110057 | 0 | 03/15/94 | preg |
110057 | 27 | 06/26/96 | symph |
110058 | 12 | 04/11/95 | med |
110059 | 0 | 03/18/94 | phs |
110059 | 24 | 03/19/96 | void |
110062 | 0 | 03/31/94 | preg |
110062 | 24 | 05/14/96 | purg |
110066 | 0 | 04/12/94 | purg |
110067 | 3 | 08/04/94 | purg |
110068 | 3 | 08/30/94 | void |
110070 | 3 | 08/30/94 | phytrt |
110074 | 0 | 06/16/94 | urod |
110075 | 15 | 10/31/95 | med |
110076 | 3 | 10/04/94 | void |
110077 | 21 | 05/10/96 | med |
110078 | 12 | 10/03/95 | diet |
110080 | 0 | 07/07/94 | sympts |
110080 | 24 | 06/25/96 | sympts |
110081 | 18 | 02/09/96 | void |
110082 | 12 | 08/22/95 | phytrt |
110083 | 0 | 02/10/95 | ucult |
110085 | 0 | 10/11/94 | phytrt |
110086 | 18 | 04/30/96 | diet |
110087 | 3 | 05/30/95 | phytrt |
110088 | 0 | 03/07/95 | excl2 |
110091 | 12 | 04/02/96 | void |
110092 | 6 | 09/19/95 | void |
110093 | 12 | 03/05/96 | med |
110094 | 9 | 03/26/96 | purg |
110095 | 6 | 12/05/95 | phytrt |
110096 | 6 | 12/19/95 | urn |
110097 | 21 | 03/18/97 | med |
110100 | 0 | 07/14/95 | def1 |
110100 | 21 | 04/22/97 | void |
110104 | 1 | 10/23/95 | symph |
110107 | 0 | 09/22/95 | urod |
110110 | 0 | 11/10/95 | prior |
110111 | 0 | 10/17/95 | prior |
110112 | 15 | 01/21/97 | phytrt |
110114 | 0 | 11/10/95 | diet |
110115 | 0 | 12/01/95 | preg |
110117 | 0 | 12/11/95 | void |
110118 | 12 | 01/21/97 | purg |
110120 | 0 | 01/09/96 | excl2 |
110121 | 9 | 09/03/96 | cmed |
110123 | 0 | 01/23/96 | back |
110124 | 0 | 02/05/96 | urn |
110125 | 9 | 12/10/96 | phytrt |
110127 | 1 | 03/27/96 | purg |
110128 | 6 | 09/17/96 | void |
110131 | 3 | 06/04/96 | med |
110134 | 0 | 04/15/96 | hem |
110135 | 1 | 05/16/96 | med |
110136 | 9 | 01/21/97 | void |
110138 | 6 | 12/03/96 | med |
110140 | 0 | 05/21/96 | void |
110142 | 0 | 06/04/96 | prior |
110144 | 0 | 06/07/96 | hmrpt |
110145 | 6 | 01/14/97 | void |
110147 | 3 | 09/17/96 | void |
110149 | 0 | 06/28/96 | urod |
110152 | 0 | 07/19/96 | incl |
110154 | 0 | 07/22/96 | void |
110155 | 6 | 01/28/97 | qul |
110158 | 0 | 08/26/96 | cmed |
110161 | 0 | 10/01/96 | prior |
110163 | 6 | 03/18/97 | diet |
110165 | 3 | 01/14/97 | cmed |
110167 | 0 | 11/19/96 | purg |
110171 | 0 | 01/21/97 | hem |
Let's work our way through the code. The DO statement tells SAS to start at 100, increase i by 100 each time, and end at 8600. That is, SAS will execute the DO loop when the index variable i equals 100, 200, 300, ..., 8600.
Now the SET statement contains an option that we've not seen before, namely the POINT= option. The POINT= option tells SAS not to read the stat481.log11 data set sequentially as is done by default, but rather to read the observation number specified by the POINT= option directly from the data set. For example, when i = 100, and therefore POINT = 100, SAS reads the 100th observation in the stat481.log11 data set. And when i = 3200, and therefore POINT = 3200, SAS reads the 3200th observation in the stat481.log11 data set.
The OUTPUT statement, of course, tells SAS to write to the output data set the observation that has been selected. If we did not place the OUTPUT statement within the DO loop, the resulting data set would contain only one observation, that is, the last observation read into the program data vector.
The STOP statement, which is new to us, is necessary because we are using the POINT= option. As you know, the DATA step by default continues to read observations until it reaches the end-of-file marker in the input data. Because the POINT= option reads only specified observations, SAS cannot read an end-of-file marker as it would if the file were being read sequentially. The STOP statement tells SAS to stop processing the current DATA step immediately and to resume processing statements after the end of the current DATA step. It is the use of the STOP statement, therefore, that keeps us from sending SAS into the no man's land of continuous looping.
Now, right-click to download and save the stat481.log11 data set in a convenient location on your computer. Launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the data set. Then, run the program and review the output from the PRINT procedure to see the selected observations. You shouldn't be surprised to see that the sample data set contains 86 observations:
PROC PRINT data = sample NOOBS;
NOTE: Writing HTML Body file: sashtml1.htm
title 'Subset of Logged Observations for Hospital 11';
RUN;
NOTE: There were 86 observations read from the data set WORK.SAMPLE.
NOTE: PROCEDURE PRINT used (Total process time):
real time 0.64 seconds
cpu time 0.29 seconds
as the iterative DO loop executes 8600 divided by 100, or 86 times.