18.1 - Constructing Do Loops

In this section, we'll explore the use of iterative DO loops, in which you tell SAS to execute a statement or a group of statements a certain number of times. Let's take a look at some examples!

Example 18.1 Section

The following program uses a DO loop to tell SAS to determine what four times three (4 × 3) equals:

OPTIONS PS = 58 LS = 78 NODATE NONUMBER;
DATA multiply;
    answer = 0;
    do i = 1 to 4;
        answer + 3;
    end;
RUN;
PROC PRINT NOOBS;
    title 'Four Times Three Equals...';
RUN;

Four Times Three Equals...

answer

i

12

5

Okay... admittedly, we could accomplish our goal of determining four times three in a much simpler way, but then we wouldn't have the pleasure of seeing how we can accomplish it using an iterative DO loop! The key to understanding the DATA step here is to recall that multiplication is just repeated addition. That is, four times three (4 × 3) is the same as adding three together four times, that is, 3 + 3 + 3 + 3. That's all that the iterative DO loop in the DATA step is telling SAS to do. After having initialized answer to 0, add 3 to answer, then add 3 to answer again, and add 3 to answer again, and add 3 to answer again. After SAS has added 3 to the answer variable four times, SAS exits the DO loop, and since that's the end of the DATA step, SAS moves on to the next procedure and prints the result.

The other thing you might want to notice about the DATA step is that there is no input data set or input data file. We are generating data from scratch here, rather than from some input source. Now, launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly calculates four times three.

Ahhh, what about that i variable that shows up in our multiply data set? If you look at our DATA step again, you can see that it comes from the DO loop. It is what is called the index variable (or counter variable). Most often, you'll want to drop it from your output data set, but its presence here is educational. As you can see, its current value is 5. That's what allows SAS to exit the DO loop... we tell SAS only to take the actions inside the loop until i equals 4. Once i becomes greater than 4, SAS jumps out of the loop, and moves on to the next statements in the DATA step. Let's take a look at the general form of iterative DO loops.

General Form of Iterative Do Loops Section

To construct an iterative DO loop, you need to start with a DO statement, then include some action statements, and then end with an END statement. Here's what a simple iterative DO loop should look like:

DO index-variable = start TO stop BY increment;
        action statements;
   END;

where:

  • DO, index-variable, start, TO, stop, and END are required in every iterative DO loop
  • index-variable, which stores the value of the current iteration of the DO loop, can be any valid SAS variable name. It is common, however, to use a single letter, with i and j being the most used.
  • start is the value of the index variable at which you want SAS to start the loop
  • stop is the value of the index variable at which you want SAS to stop the loop
  • increment is by how much you want SAS to change the index variable after each iteration. The most commonly used increment is 1. In fact, if you don't specify a BY clause, SAS uses the default increment of 1.

For example, this DO statement:

do jack = 1 to 5;

tells SAS to create an index variable called jack, start at 1, increment by 1, and end at 5, so that the values of jack from iteration to iteration are 1, 2, 3, 4, and 5. And, this DO statement:

do jill = 2 to 12 by 2;

tells SAS to create an index variable called jill, start at 2, increment by 2, and end at 12, so that the values of jill from iteration to iteration are 2, 4, 6, 8, 10, and 12.

Example 18.2: Explicit OUTPUT Statements Section

The following program uses an iterative DO loop to tell SAS to determine the multiples of 5 up to 100:

DATA multiply (drop = i);
    multiple = 0;
    do i = 1 to 20;
    	multiple + 5;
        output;
	end;
RUN;
PROC PRINT NOOBS;
	title 'Multiples of 5 up to 100';
RUN

Multiples of 5 up to 100

multiple

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

In this case, we are not interested in one particular multiplication, but rather in a series of multiplications, 1 × 5, 2 × 5, 3 × 5, ... That's where the OUTPUT statement comes into play. The previous example created just one observation because it relied on the automatic output at the end of the DATA step. Here, we override the automatic output by explicitly telling SAS to output the value of the multiple variable every time that SAS adds 5 to it. The DATA statement's DROP= option tells SAS not to bother to output the index variable i. Now, launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly generates multiples of 5.

Example 18.3 Section

The following SAS program uses an iterative DO loop to count backwards by 1:

DATA backwardsbyone;
	do i = 20 to 1 by -1;
	    	output;
        end;
RUN;backward 
PROC PRINT data = backwardsbyone NOOBS;
	title 'Counting Backwards by 1';
RUN;

Counting Backwards by 1

i

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

As you can see in this DO statement, you can decrement a DO loop's index variable by specifying a negative value for the BY clause. Here, we tell SAS to start at 20 and decrease the index variable by 1, until it reaches 1. The OUTPUT statement tells SAS to output the value of the index variable i for each iteration of the DO loop. Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly counts backward from 20 to 1.

Specifying a Series of Items Section

Rather than specifying start, stop and increment values in a DO statement, you can tell SAS how many times to execute a DO loop by listing items in a series. In this case, the general form of the iterative DO loop looks like this:

DO index-variable = value1, value2, value3, ...;
	action statements;
END;

where the values can be character or numeric. When the DO loop executes, it executes once for each item in the series. The index variable equals the value of the current item. You must use commas to separate items in a series. To list items in a series, you must specify

  • either all numeric values:
       DO i = 1, 2, 3, 4, 5;
  • all character values, with each value enclosed in quotation marks
       DO j = 'winter', 'spring', 'summer', 'fall';
  • or all variable names:
       DO k = first, second, third;

In this case, the index variable takes on the values of the specified variables. Note that the variable names are not enclosed in quotation marks, while quotation marks are required for character values.