34.2 - Random Sampling with Replacement

34.2 - Random Sampling with Replacement

In the previous section, all of the samples that we selected were without replacement. That is, once an observation was selected from the data set, it could not be selected again. Now, we'll investigate how to take random samples with replacement. That is, if an observation is selected once, it does not prevent it from being selected again.

Example 34.7

The following code illustrates how to use the DATA step to randomly select an exact-sized random sample with replacement. Specifically, the program uses the ranuni function in conjunction with the POINT= option of the SET statement to tell SAS to randomly sample exactly 15 of the 50 observations from the permanent SAS data set mailing:

DATA sample4A;
    choose=int(ranuni(58)*n)+1;
	set stat482.mailing point=choose nobs=n;
	i+1;
	if i > 15 then stop;
RUN;
                                	
PROC PRINT data=sample4A;
        title1 'Sample4A: Exact-Sized Unrestricted Random Sample';
        title2 'Selects units with equal probabilities & with replacement';
RUN;

Sample4A: Exact-Sized Unrestricted Random Sample  
Selects units with equal probabilities & with replacement

Obs

Num

Name

Street

City

State

i

1

24

Mark Mendel

256 Fraser Street

State College

PA

1

2

14

William Edwards

79 Oak Lane

Bellefonte

PA

2

3

10

Laura Mills

704 Hill Street

Bellefonte

PA

3

4

3

Jim Jefferson

10101 Allegheny Street

Bellefonte

PA

4

5

45

Ann Draper

72 Lake Road

Port Matilda

PA

5

6

11

Linda Bentlager

1010 Tricia Lane

Bellefonte

PA

6

7

47

Barb Wyse

21 Cleveland Drive

Port Matilda

PA

7

8

29

Joe White

678 S. Allen Street

State College

PA

8

9

32

George Ball

888 Park Avenue

State College

PA

9

10

31

Robert Williams

156 Straford Drive

State College

PA

10

11

49

Tim Winters

95 Dove Street

Port Matilda

PA

11

12

42

Casey Spears

123 Main Street

Port Matilda

PA

12

13

47

Barb Wyse

21 Cleveland Drive

Port Matilda

PA

13

14

32

George Ball

888 Park Avenue

State College

PA

14

15

48

Coach Pierce

74 Main Street

Port Matilda

PA

15

Launch and run  the SAS program. Then, review the resulting output to convince yourself that the code did indeed select a sample of 15 observations from the mailing data set.

The key to understanding how this code works is to understand what the expression:

choose = int(ranuni(58)*n) + 1

accomplishes. As you know, ranuni(58) tells SAS to use an initial seed of 58 to generate a uniform random number between 0 and 1. For the sake of example, suppose SAS generates the number 0.99. Then, the value of choose becomes 50 as calculated here:

choose = int(0.99*50) + 1 = int(49.5) + 1 = 49 + 1 = 50

And, if SAS generates the number 0.01, the value of choose becomes 1 as calculated here:

choose = int(0.01*50) + 1 = int(0.5) + 1 = 0 + 1 = 1

In this way, you can see how the expression always generates a positive integer 1, 2, 3, ..., up to n, the number of observations in your data set. All we need to do then is to tell SAS to generate such a random integer over and over again until we reach our desired sample size.

Here's a summary of the approach:

  • Use the NOBS= option of the SET statement to determine n, the number of observations in the original data set.
  • Use the above choose= assignment statement to generate a random integer between 1 and n. (Note that the choose= assignment statement must be placed before the SET statement. If it is not, SAS would not know which observation to read first.)
  • Use the POINT= option of the SET statement to select the choose'th observation from the original data set. The POINT= option tells SAS to read the SAS data set using direct access by observation number. In general, with the POINT= option, you name a temporary variable (here, choose) whose value is the number of the observation you want the SET statement to read.
  • Perform the above two steps repeatedly, keeping count of the number of observations selected. The expression i + 1 takes care of the counting for us: by default, SAS sets i to 0 on the first iteration of the DATA step, and then increases i by 1 for each subsequent iteration.
  • Once you've selected the number of observations desired (15, here), tell SAS to STOP. Note that when using the POINT= option, you must use a STOP statement to tell SAS when to stop processing the DATA step.

That's all there is to it! Again, you might want to change the seed (the 58) and the sample size (the 15) a few times to see how it affects the sample.

Example 34.8

The following code illustrates an alternative way of randomly selecting an exact-sized random sample with replacement. Specifically, the program uses the SURVEYSELECT procedure to tell SAS to randomly sample exactly 15 of the 50 observations from the permanent SAS data set mailing:

PROC SURVEYSELECT data = stat482.mailing
            out = sample4B
                	 method = URS
                	 seed = 12345
                	 sampsize = 15;
    title;
RUN;
PROC PRINT data = sample4B;
     title1 'Sample4B: Exact-Sized Unrestricted Random Sample';
     title2 'Selects units with equal probabilities & with replacement';
     title3 '(using PROC SURVEYSELECT)';
RUN;

Selects units with equal probabilities & with replacement    
(using PROC SURVEYSELECT)

Obs

Num

Name

Street

City

State

NumberHits

1

10

Laura Mills

704 Hill Street

Bellefonte

PA

1

2

14

William Edwards

79 Oak Lane

Bellefonte

PA

1

3

15

Harold Harvey

480 Main Street

Bellefonte

PA

1

4

42

Casey Spears

123 Main Street

Port Matilda

PA

1

5

45

Ann Draper

72 Lake Road

Port Matilda

PA

1

6

48

Coach Pierce

74 Main Street

Port Matilda

PA

1

7

17

Rigna Patel

101 Beaver Avenue

State College

PA

2

8

20

Kristin Jones

120 Stratford Drive

State College

PA

1

9

29

Joe White

678 S. Allen Street

State College

PA

1

10

30

Daniel Peterson

328 Waupelani Drive

State College

PA

2

11

31

Robert Williams

156 Straford Drive

State College

PA

1

12

34

Mike Dahlberg

1201 No. Atherton

State College

PA

1

13

37

Scott Henderson

245 W. Beaver Avenue

State College

PA

1

Launch and run  the SAS program. Then, review the resulting output to convince yourself that the code did indeed select a sample of 15 observations from the mailing data set. Note that the only difference between this code and the previous SURVEYSELECT code is the method = URS statement here replaces the method = SRS statement there. Here, URS tells SAS to use the unrestricted random sampling method to select observations, that is, with equal probability and with replacement. (Oh, yeah, I guess the specified seed differs from the previous code, too, but that's no matter.)

Again, you might want to change the seed (seed) and sample size (sampsize) a few times to see how it affects the sample.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility