Lesson 18: Generating Data With Do Loops

Lesson 18: Generating Data With Do Loops

Overview

When programming, you can find yourself needing to tell SAS to execute the same statements over and over again. That's when a DO loop can come in and save your day. The actions of some DO loops are unconditional in that if you tell SAS to do something 20 times, SAS will do it 20 times regardless. We call those kinds of loops iterative DO loops. On the other hand, actions of some DO loops are conditional in that you tell SAS to do something until a particular condition is met or to do something while a particular condition is met. We call the former a DO UNTIL loop and the latter a DO WHILE loop. In this lesson, we'll explore the ins and outs of these three different kinds of loops, as well as take a look at lots of examples in which they are used. Then, in the next lesson, we'll use DO loops to help us process arrays.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • write an iterative DO loop to tell SAS to execute a statement or a set of statements a specified number of times
  • tell SAS to increase the index variable in an iterative DO loop by more than the default 1 unit
  • tell SAS to decrease, rather than increase, the index variable in an iterative DO loop
  • write nested iterative DO loops
  • use an iterative DO loop to process data that are read from a data set
  • write a DO UNTIL loop to tell SAS to execute a statement or a set of statements until a certain condition is met
  • explain that a DO UNTIL loop always executes at least once
  • write a DO WHILE loop to tell SAS to execute a statement or a set of statements while a certain condition is true
  • explain that if the expression in a DO WHILE loop is false the first time it is evaluated, then the DO loop doesn't even execute once
  • define the primary difference between the DO UNTIL and DO WHILE loops
  • write an iterative DO loop that executes the DO loop conditionally as well as unconditionally
  • use an iterative DO loop and the SET statement's POINT= option to select a (patterned) sample from a large data set

18.1 - Constructing Do Loops

18.1 - Constructing Do Loops

In this section, we'll explore the use of iterative DO loops, in which you tell SAS to execute a statement or a group of statements a certain number of times. Let's take a look at some examples!

Example 18.1

The following program uses a DO loop to tell SAS to determine what four times three (4 × 3) equals:

OPTIONS PS = 58 LS = 78 NODATE NONUMBER;
DATA multiply;
    answer = 0;
    do i = 1 to 4;
        answer + 3;
    end;
RUN;
PROC PRINT NOOBS;
    title 'Four Times Three Equals...';
RUN;

Four Times Three Equals...

answer

i

12

5

Okay... admittedly, we could accomplish our goal of determining four times three in a much simpler way, but then we wouldn't have the pleasure of seeing how we can accomplish it using an iterative DO loop! The key to understanding the DATA step here is to recall that multiplication is just repeated addition. That is, four times three (4 × 3) is the same as adding three together four times, that is, 3 + 3 + 3 + 3. That's all that the iterative DO loop in the DATA step is telling SAS to do. After having initialized answer to 0, add 3 to answer, then add 3 to answer again, and add 3 to answer again, and add 3 to answer again. After SAS has added 3 to the answer variable four times, SAS exits the DO loop, and since that's the end of the DATA step, SAS moves on to the next procedure and prints the result.

The other thing you might want to notice about the DATA step is that there is no input data set or input data file. We are generating data from scratch here, rather than from some input source. Now, launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly calculates four times three.

Ahhh, what about that i variable that shows up in our multiply data set? If you look at our DATA step again, you can see that it comes from the DO loop. It is what is called the index variable (or counter variable). Most often, you'll want to drop it from your output data set, but its presence here is educational. As you can see, its current value is 5. That's what allows SAS to exit the DO loop... we tell SAS only to take the actions inside the loop until i equals 4. Once i becomes greater than 4, SAS jumps out of the loop, and moves on to the next statements in the DATA step. Let's take a look at the general form of iterative DO loops.

General Form of Iterative Do Loops

To construct an iterative DO loop, you need to start with a DO statement, then include some action statements, and then end with an END statement. Here's what a simple iterative DO loop should look like:

DO index-variable = start TO stop BY increment;
        action statements;
   END;

where:

  • DO, index-variable, start, TO, stop, and END are required in every iterative DO loop
  • index-variable, which stores the value of the current iteration of the DO loop, can be any valid SAS variable name. It is common, however, to use a single letter, with i and j being the most used.
  • start is the value of the index variable at which you want SAS to start the loop
  • stop is the value of the index variable at which you want SAS to stop the loop
  • increment is by how much you want SAS to change the index variable after each iteration. The most commonly used increment is 1. In fact, if you don't specify a BY clause, SAS uses the default increment of 1.

For example, this DO statement:

do jack = 1 to 5;

tells SAS to create an index variable called jack, start at 1, increment by 1, and end at 5, so that the values of jack from iteration to iteration are 1, 2, 3, 4, and 5. And, this DO statement:

do jill = 2 to 12 by 2;

tells SAS to create an index variable called jill, start at 2, increment by 2, and end at 12, so that the values of jill from iteration to iteration are 2, 4, 6, 8, 10, and 12.

Example 18.2: Explicit OUTPUT Statements

The following program uses an iterative DO loop to tell SAS to determine the multiples of 5 up to 100:

DATA multiply (drop = i);
    multiple = 0;
    do i = 1 to 20;
    	multiple + 5;
        output;
	end;
RUN;
PROC PRINT NOOBS;
	title 'Multiples of 5 up to 100';
RUN

Multiples of 5 up to 100

multiple

5

10

15

20

25

30

35

40

45

50

55

60

65

70

75

80

85

90

95

100

In this case, we are not interested in one particular multiplication, but rather in a series of multiplications, 1 × 5, 2 × 5, 3 × 5, ... That's where the OUTPUT statement comes into play. The previous example created just one observation because it relied on the automatic output at the end of the DATA step. Here, we override the automatic output by explicitly telling SAS to output the value of the multiple variable every time that SAS adds 5 to it. The DATA statement's DROP= option tells SAS not to bother to output the index variable i. Now, launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly generates multiples of 5.

Example 18.3

The following SAS program uses an iterative DO loop to count backwards by 1:

DATA backwardsbyone;
	do i = 20 to 1 by -1;
	    	output;
        end;
RUN;backward 
PROC PRINT data = backwardsbyone NOOBS;
	title 'Counting Backwards by 1';
RUN;

Counting Backwards by 1

i

20

19

18

17

16

15

14

13

12

11

10

9

8

7

6

5

4

3

2

1

As you can see in this DO statement, you can decrement a DO loop's index variable by specifying a negative value for the BY clause. Here, we tell SAS to start at 20 and decrease the index variable by 1, until it reaches 1. The OUTPUT statement tells SAS to output the value of the index variable i for each iteration of the DO loop. Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that our code properly counts backward from 20 to 1.

Specifying a Series of Items

Rather than specifying start, stop and increment values in a DO statement, you can tell SAS how many times to execute a DO loop by listing items in a series. In this case, the general form of the iterative DO loop looks like this:

DO index-variable = value1, value2, value3, ...;
	action statements;
END;

where the values can be character or numeric. When the DO loop executes, it executes once for each item in the series. The index variable equals the value of the current item. You must use commas to separate items in a series. To list items in a series, you must specify

  • either all numeric values:
       DO i = 1, 2, 3, 4, 5;
  • all character values, with each value enclosed in quotation marks
       DO j = 'winter', 'spring', 'summer', 'fall';
  • or all variable names:
       DO k = first, second, third;

In this case, the index variable takes on the values of the specified variables. Note that the variable names are not enclosed in quotation marks, while quotation marks are required for character values.


18.2 - Nesting Do Loops

18.2 - Nesting Do Loops

One way to make iterative DO loops even more powerful is to place one DO loop inside of another. Putting a DO loop within another DO loop is called nesting. We'll take a look at a few examples here.

Example 18.4

Suppose you are interested in conducting an experiment with two factors A and B. Suppose factor A is, say, the amount of water with levels 1, 2, 3, and 4; and factor B is, say, the amount of sunlight, say with levels 1, 2, 3, 4, and 5. Then, the following SAS code uses nested iterative DO loops to generate the 4 by 5 factorial design:

DATA design;
DO i = 1 to 4;
    DO j = 1 to 5;
        output;
            END;
    END;
RUN;
PROC PRINT data = design;
    TITLE '4 by 5 Factorial Design';
RUN;

4 by 5 Factorial Design

Obs

i

j

1

1

1

2

1

2

3

1

3

4

1

4

5

1

5

6

2

1

7

2

2

8

2

3

9

2

4

10

2

5

11

3

1

12

3

2

13

3

3

14

3

4

15

3

5

16

4

1

17

4

2

18

4

3

19

4

4

20

4

5

First, launch and run  the SAS program. Then, review the output from the PRINT procedure to see the contents of the design data set. By doing so, you can get a good feel for how the nested DO loops work. First, SAS sets the value of the index variable i to 1, then proceeds to the next step which happens to be another iterative DO loop. While i is 1:

  • SAS sets the value of j to 1, and outputs the observation in which i = 1 and j = 1.
  • Then, SAS sets the value j to 2, and outputs the observation in which i = 1 and j = 2.
  • Then, SAS sets the value j to 3, and outputs the observation in which i = 1 and j = 3.
  • Then, SAS sets the value j to 4, and outputs the observation in which i = 1 and j = 4.
  • Then, SAS sets the value j to 5, and outputs the observation in which i = 1 and j = 5.
  • Then, SAS sets the value j to 6, jumps out of the inside DO loop, and proceeds to the next statement, which happens to be the end of the outside DO loop.

SAS then sets the value of the index variable i to 2, then proceeds through the inside DO loop again just as described above. This process continues until SAS sets the value of index variable i to 5, jumps out of the outside DO loop, and ends the DATA step.

Example 18.5

Back to our experiment with two factors A and B. Suppose this time that factor A is, say, the amount of water with levels 10, 20, 30, and 40 liters; and factor B is, say, the amount of sunlight, say with levels 3, 6, 9, 12, and 15 hours. The following SAS code uses two DO loops with BY options to generate a more meaningful 4 by 5 factorial design that corresponds to the exact levels of the factors:

DATA design;
DO i = 10 to 40 by 10;
    DO j = 3 to 15 BY 3;
        output;
            END;
    END;
RUN;
PROC PRINT data = design;
    TITLE '4 by 5 Factorial Design';
RUN;

4 by 5 Factorial Design

Obs

i

j

1

10

3

2

10

6

3

10

9

4

10

12

5

10

15

6

20

3

7

20

6

8

20

9

9

20

12

10

20

15

11

30

3

12

30

6

13

30

9

14

30

12

15

30

15

16

40

3

17

40

6

18

40

9

19

40

12

20

40

15

First, launch and run  the SAS program. Then, review the output from the PRINT procedure to see the contents of the design data set. By doing so, you can get a good feel for how the nested DO loops with BY options work. First, SAS sets the value of the index variable i to 10, then proceeds to the next step which happens to be another iterative DO loop. While i is 10:

  • SAS sets the value of j to 3, and outputs the observation in which i = 10 and j = 3.
  • Then, SAS sets the value j to 6, and outputs the observation in which i = 10 and j = 6.
  • Then, SAS sets the value j to 9, and outputs the observation in which i = 10 and j = 9.
  • Then, SAS sets the value j to 12, and outputs the observation in which i = 10 and j = 12.
  • Then, SAS sets the value j to 15, and outputs the observation in which i = 10 and j = 15.
  • Then, SAS sets the value j to 18, jumps out of the inside DO loop, and proceeds to the next statement, which happens to be the end of the outside DO loop.

SAS then sets the value of the index variable i to 20, then proceeds through the inside DO loop again just as described above. This process continues until SAS sets the value of index variable i to 50, jumps out of the outside DO loop, and ends the DATA step.


18.3 - Iteratively Processing Data

18.3 - Iteratively Processing Data

So far all of the examples that we've looked at have involved using DO loops to generate one or more observations from one iteration of the DATA step. Now, let's look at a example that involves reading a data set, and then using a DO loop to compute the value of a new variable.

Example 18.6

Every Monday morning, a credit union in Pennsylvania announces the interest rates for certificates of deposit (CDs) that it will honor for CDs opened during the business week. Suppose you want to determine how much each CD will earn at maturity with an initial investment of $5,000. The following program reads in the interest rates advertised one week in early 2009, and then uses a DO loop to calculate the value of each CD when it matures:

DATA cdinvest (drop = i);
	input Type $ 1-7 AnnualRate Months;
    Investment = 5000;
    do i = 1 to Months;
    	Investment + (AnnualRate/12)*Investment;
	end;
    format Investment dollar8.2;
    DATALINES;
03Month  0.01980  3
06Month  0.02230  6
09Month  0.02230  9
12Month  0.02470 12
18Month  0.02470 18
24Month  0.02570 24
36Month  0.02720 36
48Month  0.02960 48
60Month  0.03445 60;
RUN;
PROC PRINT data = cdinvest NOOBS;
    title 'Comparison of Different CD Rates';
RUN;

Comparison of Different CD Rates

Type

AnnualRate

Months

Investment

03Month

0.01980

3

$5024.79

06Month

0.02230

6

$5056.01

09Month

0.02230

9

$5084.25

12Month

0.02470

12

$5124.91

18Month

0.02470

18

$5188.53

24Month

0.02570

24

$5263.43

36Month

0.02720

36

$5424.61

48Month

0.02960

48

$5627.65

60Month

0.03445

60

$5938.41

Let's work our way through the code to see how SAS processes the first observation, say. As the INPUT statement suggests, each record in the instream data contains three pieces of information: the type of CD (Type), the annual interest rate (AnnualRate), and the time to maturity in months (Months). A new variable called Investment and the index variable i are created within the DATA step. Therefore, at the end of the compile phase, the program data vector looks like this:

_N_

_ERROR_

Type

AnnualRate

Months

Investment

i

1

0

 

.

.

.

.

In the first iteration of the DATA step, the first observation is read from the instream data, the Investment variable is initialized to 5000, and the index variable i is set to 1. At the start of the DO loop, therefore, the program data vector looks like this:

_N_

_ERROR_

Type

AnnualRate

Months

Investment

i

1

0

03Month

0.0198

3

5000

1

The assignment statement tells SAS to take the current value of Investment, 5000, and add to it the amount of interest earned in one month. Because our input data set contains annual rates, we need to divide the annual rates by 12 to get monthly interest rates. The annual rate for the 3-month certificate is 0.0198, so that makes the monthly rate 0.0198 divided by 12, or 0.00165. Multiply that monthly rate, 0.00165, by the current value of Investment, 5000, and you get 8.25. So, after one month in a 3-month certificate, your 5000 dollars will have turned into 5008.25. Here's what the program data vector looks like with the updated Investment value:

_N_

_ERROR_

Type

AnnualRate

Months

Investment

i

1

0

03Month

0.0198

3

5008.25

1

Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. Notice that the Months variable is used as the stop value in the DO loop. As a result, the DO loop executes the number of times that are specified by the current value of Months, which is 3. The index variable is increased to 2. Because it is not greater than 3, SAS processes the DO loop again. SAS multiplies the current value of Investment, 5008.25, by the monthly rate, 0.00165, to determine that the interest earned in the second month is 8.2636125. Therefore, after two months in a 3-month certificate, your 5000 dollars will have turned into 5008.25 + 8.2636125, or 5016.5136 dollars. Here's what the program data vector looks like at the end of the second iteration of the DO loop:

_N_

_ERROR_

Type

AnnualRate

Months

Investment

i

1

0

03Month

0.0198

3

5016.5136

2

Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. The index variable is increased to 3. Because it is not greater than 3, SAS processes the DO loop again. SAS multiplies the current value of Investment, 5016.5136, by the monthly rate, 0.00165, to determine that the interest earned in the third month is 8.2772474. Therefore, after three months in a 3-month certificate, your 5000 dollars will have turned into 5016.5136 + 8.2772474, or 5024.7908 dollars. Here's what the program data vector looks like at the end of the third iteration of the DO loop:

_N_

_ERROR_

Type

AnnualRate

Months

Investment

i

1

0

03Month

0.0198

3

5024.7908

3

Being at the end of the DO loop SAS returns to the top of the DO loop to determine if it needs to be executed again. The index variable is increased to 4. Because it is greater than 3, SAS steps out of the DO loop and moves on to the next statement. Here's what the program data vector looks like now:

_N_

_ERROR_

Type

AnnualRate

Months

Investment

i

1

0

03Month

0.0198

3

5024.7908

4

The FORMAT statement is not an executable statement. It is used in the compile phase to create the program data vector. Therefore, SAS has reached the end of the DATA step, and therefore writes the program data vector to create the first observation in the cdinvest data set:

Type

AnnualRate

Months

Investment

03Month

0.0198

3

5024.7908

    
    

Because of the DROP= data set option, SAS does not write the value of the index variable i to the output data set. Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that SAS created the first observation as claimed. Of course, the other observations are created just as the first one was created as described above.


18.4 - Conditionally Executing Do Loops

18.4 - Conditionally Executing Do Loops

As you now know, the iterative DO loop requires that you specify the number of iterations for the DO loop. However, there are times when you want to execute a DO loop until a condition is reached or while a condition exists, but you don't know how many iterations are needed. That's when the DO UNTIL loop and the DO WHILE loop can help save the day!

In this section, we'll first learn about the DO UNTIL and DO WHILE loops. Then, we'll look at another form of the iterative DO loop that combines features of both conditional and unconditional DO loops.

The DO UNTIL Loop

When you use a DO UNTIL loop, SAS executes the DO loop until the expression you've specified is true. Here's the general form of a DO UNTIL loop:

DO UNTIL (expression);
	action statements;
END;

where expression is any valid SAS expression enclosed in parentheses. The key thing to remember is that the expression is not evaluated until the bottom of the loop. Therefore, a DO UNTIL loop always executes at least once. As soon as the expression is determined to be true, the DO loop does not execute again.

Example 18.7

Suppose you want to know how many years it would take to accumulate $50,000 if you deposit $1200 each year into an account that earns 5% interest. The following program uses a DO UNTIL loop to perform the calculation for us:

DATA investment;
	DO UNTIL (value >= 50000);
    	value + 1200;
        	value + value * 0.05;
            year + 1;
            OUTPUT;
	END;
RUN;
 
PROC PRINT data = investment NOOBS;
	title 'Years until at least $50,000';
RUN;

Years until at least $50,000

value

year

1260.00

1

2583.00

2

3972.15

3

5430.76

4

6962.30

5

8570.41

6

10258.93

7

12031.88

8

13893.47

9

15848.14

10

17900.55

11

20055.58

12

22318.36

13

24694.28

14

27188.99

15

29808.44

16

32558.86

17

35446.80

18

38479.14

19

41663.10

20

45006.26

21

48516.57

22

52202.40

23

Recall that the expression in the DO UNTIL statement is not evaluated until the bottom of the loop. Therefore, the DO UNTIL loop executes at least once. On the first iteration, the value variable is increased by 1200, or in this case, set to 1200. Then, the value variable is updated by calculating 1200 + 1200*0.05 to get 1260. Then, the year variable is increased by 1, or in this case, set to 1. The first observation, for which year = 1 and value = 1260, is then written to the output data set called investment. Having reached the bottom of the DO UNTIL loop, the expression (value >= 50000) is evaluated to determine if it is true. Since value is just 1260, the expression is not true, and so the DO UNTIL loop is executed once again. The process continues as described until SAS determines that value is at least 50000 and therefore stops executing the DO UNTIL loop.

Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that it would take 23 years to accumulate at least $50,000.

The DO WHILE Loop

When you use a DO WHILE loop, SAS executes the DO loop while the expression you've specified is true. Here's the general form of a DO WHILE loop:

DO WHILE (expression);
      action statements;
END;

where expression is any valid SAS expression enclosed in parentheses. An important difference between the DO UNTIL and DO WHILE statements is that the DO WHILE expression is evaluated at the top of the DO loop. If the expression is false the first time it is evaluated, then the DO loop doesn't even execute once.

Example 18.8

The following program attempts to use a DO WHILE loop to accomplish the same goal as the program above, namely to determine how many years it would take to accumulate \$50,000 if you deposit \$1200 each year into an account that earns 5% interest:

DATA investtwo;
    DO WHILE (value >= 50000);
		value + 1200;
		value + value * 0.05;
		year + 1;
		OUTPUT;
     END;
RUN;
 
PROC PRINT data = investtwo NOOBS;
   title 'Years until at least $50,000';
RUN;

23   PROC PRINT data = investtwo NOOBS;
24      title 'Years until at least $50,000';
25   RUN;
NOTE: No observations in data set WORK.INVESTTWO.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.00 seconds
26   QUIT

Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that ... OOPS! There is no output! The program fails, because in a DO WHILE loop, the expression, in this case (value >= 50000), is evaluated at the top of the loop. Since value is set to missing before the first iteration of the DATA step, SAS can never enter the DO WHILE loop. Therefore, the code proves to be ineffective. Review the log to convince yourself that the investtwo data set contains no observations because the DO WHILE loop was unable to execute.

Example 18.9

Now, the following program correctly uses a DO WHILE loop to determine how many years it would take to accumulate $50,000 if you deposit $1200 each year into an account that earns 5% interest:

DATA investthree;
     value = 0;
     DO WHILE (value < 50000);
	      value + 1200;
		  value + value * 0.05;
		  year + 1;
		  OUTPUT;
     END;
RUN;
 
PROC PRINT data = investthree NOOBS;
   title 'Years until at least $50,000';
RUN;

Years until at least $50,000

value

year

1260.00

1

2583.00

2

3972.15

3

5430.76

4

6962.30

5

8570.41

6

10258.93

7

12031.88

8

13893.47

9

15848.14

10

17900.55

11

20055.58

12

22318.36

13

24694.28

14

27188.99

15

29808.44

16

32558.86

17

35446.80

18

38479.14

19

41663.10

20

45006.26

21

48516.57

22

52202.40

23

Note that there are just three differences between this program and that of the successful program in Example 18.7 that uses the DO UNTIL loop: i) The value variable is initialized to 0; ii) UNTIL has been changed to WHILE; and iii) the expression to be checked is now (value < 50000). Because value is set to 0 and is therefore less than 50000 at the outset, SAS can now enter the DO WHILE loop to perform our desired calculations.

The calculations proceed as before. First, the value variable is updated to by calculating 0 + 1200, to get 1200. Then, the value variable is updated by calculating 1200 + 1200*0.05 to get 1260. Then, the year variable is increased by 1, or in this case, set to 1. The first observation, for which year = 1 and value = 1260, is then written to the output data set called investthree. SAS then returns to the top of the DO WHILE loop, to determine if the expression (value < 50000) is true. Since value is just 1260, the expression is true, and so the DO WHILE loop executes once again. The process continues as described until SAS determines that value is at least 50000 and therefore stops executing the DO WHILE loop.

Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that this program also determines that it would take 23 years to accumulate at least $50,000.

Using Conditional Clauses in an Iterative DO Loop

You have now seen how the DO WHILE and DO UNTIL loops enable you to execute statements repeatedly, but conditionally so. You have also seen how the iterative DO loop enables you to execute statements a set number of times unconditionally. Now, we'll put the two together to create a form of the iterative DO loop that executes DO loops conditionally as well as unconditionally.

Example 18.10

Suppose again that you want to know how many years it would take to accumulate $50,000 if you deposit $1200 each year into an account that earns 5% interest. But this time, suppose you also want to limit the number of years that you invest to 15 years. The following program uses a conditional iterative DO loop to accumulate our investment until we reach 15 years or until the value of our investment exceeds 50,000, whichever comes first:

DATA investfour (drop = i);
     DO i = 1 to 15 UNTIL (value >= 50000);
	      value + 1200;
		  value + value * 0.05;
		  year + 1;
		  OUTPUT;
     END;
RUN;
PROC PRINT data = investfour NOOBS;
   title 'Value of Investment';
RUN;;

Value of Investment

value

year

1260.00

1

2583.00

2

3972.15

3

5430.76

4

6962.30

5

8570.41

6

10258.93

7

12031.88

8

13893.47

9

15848.14

10

17900.55

11

20055.58

12

22318.36

13

24694.28

14

27188.99

15

Note that there are just two differences between this program and that of the program in Example 18.7 that uses the DO UNTIL loop: i) The iteration i = 1 to 15 has been inserted into the DO UNTIL statement; and ii) because the index variable i is created for the DO loop, it is dropped before writing the contents from the program data vector to the output data set investfour.

Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that, in this case, 15 years comes first. That is, the portion of the DO statement that tells SAS to stop executing the DO loop is the iterative i = 1 to 15 part.

Example 18.11

Suppose this time that you want to know how many years it would take to accumulate $50,000 if you deposit $3600 each year into an account that earns 5% interest. Suppose again that you want to limit the number of years that you invest to 15 years. The following program uses a conditional iterative DO loop to accumulate our investment until we reach 15 years or until the value of our investment exceeds 50,000, whichever comes first:

DATA investfive (drop = i);
     DO i = 1 to 15 UNTIL (value >= 50000);
              value + 3600;
            	  value + value * 0.05;
            	  year + 1;
            	  OUTPUT;
      END;
 RUN;
 PROC PRINT data = investfive NOOBS;
    title 'Value of Investment';
RUN;

Value of Investment

value

year

3780.00

1

7749.00

2

11916.45

3

16292.27

4

20886.89

5

25711.23

6

30776.79

7

36095.63

8

41680.41

9

47544.43

10

53701.66

11

There is just one difference between this program and that of the previous program. The amount value has been changed from 1200 to 3600. Launch and run  the SAS program, and review the output from the PRINT procedure to convince yourself that, this time, the $50,000 comes first. That is, the portion of the DO statement that tells SAS to stop executing the DO loop is the conditional (value >= 50000) part.

The two examples of using conditional clauses with an iterative DO loop that we looked at involved using a DO UNTIL loop. We alternatively could have used a DO WHILE loop. The main thing to keep in mind is that, as before, the UNTIL expression is evaluated at the bottom of the DO loop, so the DO loop always executes at least once. The WHILE expression is evaluated before the execution of the DO loop. So, if the condition is not true, the DO loop never executes.


18.5 - Creating Samples

18.5 - Creating Samples

Because a DO loop executes statements iteratively, it provides an easy way to select a sample of observations from a large data set. Let's take a look at an example!

Example 18.12

The following program uses an iterative DO loop and the SET statement's POINT= option to select every 100th observation from the permanent data set called stat481.log11 which contains 8,624 observations:

OPTIONS LS = 72 PS = 34 NODATE NONUMBER;
LIBNAME stat481 'C:\yourdrivename\Stat481WC\06doloops\sasndata';
 
DATA sample;
    DO i = 100 to 8600 by 100;
            set stat481.log11 point = i;
            output;
    END;
    stop;
RUN;
PROC PRINT data = sample NOOBS;
    title 'Subset of Logged Observations for Hospital 11';
RUN;

Subset of Logged Observations for Hospital 11
SUBJV_TYPEV_DATEFORM_CD
110004004/22/93prior
110027301/25/94med
1100273608/27/96cmed
1100291209/27/94purg
1100294204/01/97sympts
1100391806/06/95void
110040101/24/94void
1100403902/18/97cmed
1100451505/09/95symph
110049001/25/94sympts
1100493007/23/96phytrt
1100511212/13/94void
110052305/10/94void
1100525505/06/97close
1100532402/06/96cmed
110055608/30/94sympts
110057003/15/94preg
1100572706/26/96symph
1100581204/11/95med
110059003/18/94phs
1100592403/19/96void
110062003/31/94preg
1100622405/14/96purg
110066004/12/94purg
110067308/04/94purg
110068308/30/94void
110070308/30/94phytrt
110074006/16/94urod
1100751510/31/95med
110076310/04/94void
1100772105/10/96med
1100781210/03/95diet
110080007/07/94sympts
1100802406/25/96sympts
1100811802/09/96void
1100821208/22/95phytrt
110083002/10/95ucult
110085010/11/94phytrt
1100861804/30/96diet
110087305/30/95phytrt
110088003/07/95excl2
1100911204/02/96void
110092609/19/95void
1100931203/05/96med
110094903/26/96purg
110095612/05/95phytrt
110096612/19/95urn
1100972103/18/97med
110100007/14/95def1
1101002104/22/97void
110104110/23/95symph
110107009/22/95urod
110110011/10/95prior
110111010/17/95prior
1101121501/21/97phytrt
110114011/10/95diet
110115012/01/95preg
110117012/11/95void
1101181201/21/97purg
110120001/09/96excl2
110121909/03/96cmed
110123001/23/96back
110124002/05/96urn
110125912/10/96phytrt
110127103/27/96purg
110128609/17/96void
110131306/04/96med
110134004/15/96hem
110135105/16/96med
110136901/21/97void
110138612/03/96med
110140005/21/96void
110142006/04/96prior
110144006/07/96hmrpt
110145601/14/97void
110147309/17/96void
110149006/28/96urod
110152007/19/96incl
110154007/22/96void
110155601/28/97qul
110158008/26/96cmed
110161010/01/96prior
110163603/18/97diet
110165301/14/97cmed
110167011/19/96purg
110171001/21/97hem

Let's work our way through the code. The DO statement tells SAS to start at 100, increase i by 100 each time, and end at 8600. That is, SAS will execute the DO loop when the index variable i equals 100, 200, 300, ..., 8600.

Now the SET statement contains an option that we've not seen before, namely the POINT= option. The POINT= option tells SAS not to read the stat481.log11 data set sequentially as is done by default, but rather to read the observation number specified by the POINT= option directly from the data set. For example, when i = 100, and therefore POINT = 100, SAS reads the 100th observation in the stat481.log11 data set. And when i = 3200, and therefore POINT = 3200, SAS reads the 3200th observation in the stat481.log11 data set.

The OUTPUT statement, of course, tells SAS to write to the output data set the observation that has been selected. If we did not place the OUTPUT statement within the DO loop, the resulting data set would contain only one observation, that is, the last observation read into the program data vector.

The STOP statement, which is new to us, is necessary because we are using the POINT= option. As you know, the DATA step by default continues to read observations until it reaches the end-of-file marker in the input data. Because the POINT= option reads only specified observations, SAS cannot read an end-of-file marker as it would if the file were being read sequentially. The STOP statement tells SAS to stop processing the current DATA step immediately and to resume processing statements after the end of the current DATA step. It is the use of the STOP statement, therefore, that keeps us from sending SAS into the no man's land of continuous looping.

Now, right-click to download and save the stat481.log11 data set in a convenient location on your computer. Launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the data set. Then, run  the program and review the output from the PRINT procedure to see the selected observations. You shouldn't be surprised to see that the sample data set contains 86 observations:

PROC PRINT data = sample NOOBS;
NOTE: Writing HTML Body file: sashtml1.htm
       title 'Subset of Logged Observations for Hospital 11';
RUN;
NOTE: There were 86 observations read from the data set WORK.SAMPLE.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.64 seconds
      cpu time            0.29 seconds

as the iterative DO loop executes 8600 divided by 100, or 86 times.

Note! It is important to emphasize that the method we illustrated here for selecting a sample from a large data set has nothing random about it. That is, we selected a patterned sample, not a random sample, from a large data set. That's why this section is called Creating Samples, not Creating Random Samples. We'll learn how to select a random sample from a large data set in Stat 482.

18.6 - Summary

18.6 - Summary

In this lesson, we explored four different kinds of loops — the iterative DO loop, the DO UNTIL loop, the DO WHILE loop, as well as an iterative DO loop with a conditional clause. We looked at many different applications of DO loops as well.

The homework for this lesson will give you more practice with DO loops.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility