In the previous section, all of the samples that we selected were without replacement. That is, once an observation was selected from the data set, it could not be selected again. Now, we'll investigate how to take random samples with replacement. That is, if an observation is selected once, it does not prevent it from being selected again.
Example 34.7 Section
The following code illustrates how to use the DATA step to randomly select an exact-sized random sample with replacement. Specifically, the program uses the ranuni function in conjunction with the POINT= option of the SET statement to tell SAS to randomly sample exactly 15 of the 50 observations from the permanent SAS data set mailing:
DATA sample4A;
choose=int(ranuni(58)*n)+1;
set stat482.mailing point=choose nobs=n;
i+1;
if i > 15 then stop;
RUN;
PROC PRINT data=sample4A;
title1 'Sample4A: Exact-Sized Unrestricted Random Sample';
title2 'Selects units with equal probabilities & with replacement';
RUN;
Obs | Num | Name | Street | City | State | i |
---|---|---|---|---|---|---|
1 | 24 | Mark Mendel | 256 Fraser Street | State College | PA | 1 |
2 | 14 | William Edwards | 79 Oak Lane | Bellefonte | PA | 2 |
3 | 10 | Laura Mills | 704 Hill Street | Bellefonte | PA | 3 |
4 | 3 | Jim Jefferson | 10101 Allegheny Street | Bellefonte | PA | 4 |
5 | 45 | Ann Draper | 72 Lake Road | Port Matilda | PA | 5 |
6 | 11 | Linda Bentlager | 1010 Tricia Lane | Bellefonte | PA | 6 |
7 | 47 | Barb Wyse | 21 Cleveland Drive | Port Matilda | PA | 7 |
8 | 29 | Joe White | 678 S. Allen Street | State College | PA | 8 |
9 | 32 | George Ball | 888 Park Avenue | State College | PA | 9 |
10 | 31 | Robert Williams | 156 Straford Drive | State College | PA | 10 |
11 | 49 | Tim Winters | 95 Dove Street | Port Matilda | PA | 11 |
12 | 42 | Casey Spears | 123 Main Street | Port Matilda | PA | 12 |
13 | 47 | Barb Wyse | 21 Cleveland Drive | Port Matilda | PA | 13 |
14 | 32 | George Ball | 888 Park Avenue | State College | PA | 14 |
15 | 48 | Coach Pierce | 74 Main Street | Port Matilda | PA | 15 |
Launch and run the SAS program. Then, review the resulting output to convince yourself that the code did indeed select a sample of 15 observations from the mailing data set.
The key to understanding how this code works is to understand what the expression:
choose = int(ranuni(58)*n) + 1
accomplishes. As you know, ranuni(58) tells SAS to use an initial seed of 58 to generate a uniform random number between 0 and 1. For the sake of example, suppose SAS generates the number 0.99. Then, the value of choose becomes 50 as calculated here:
choose = int(0.99*50) + 1 = int(49.5) + 1 = 49 + 1 = 50
And, if SAS generates the number 0.01, the value of choose becomes 1 as calculated here:
choose = int(0.01*50) + 1 = int(0.5) + 1 = 0 + 1 = 1
In this way, you can see how the expression always generates a positive integer 1, 2, 3, ..., up to n, the number of observations in your data set. All we need to do then is to tell SAS to generate such a random integer over and over again until we reach our desired sample size.
Here's a summary of the approach:
- Use the NOBS= option of the SET statement to determine n, the number of observations in the original data set.
- Use the above choose= assignment statement to generate a random integer between 1 and n. (Note that the choose= assignment statement must be placed before the SET statement. If it is not, SAS would not know which observation to read first.)
- Use the POINT= option of the SET statement to select the choose'th observation from the original data set. The POINT= option tells SAS to read the SAS data set using direct access by observation number. In general, with the POINT= option, you name a temporary variable (here, choose) whose value is the number of the observation you want the SET statement to read.
- Perform the above two steps repeatedly, keeping count of the number of observations selected. The expression i + 1 takes care of the counting for us: by default, SAS sets i to 0 on the first iteration of the DATA step, and then increases i by 1 for each subsequent iteration.
- Once you've selected the number of observations desired (15, here), tell SAS to STOP. Note that when using the POINT= option, you must use a STOP statement to tell SAS when to stop processing the DATA step.
That's all there is to it! Again, you might want to change the seed (the 58) and the sample size (the 15) a few times to see how it affects the sample.
Example 34.8 Section
The following code illustrates an alternative way of randomly selecting an exact-sized random sample with replacement. Specifically, the program uses the SURVEYSELECT procedure to tell SAS to randomly sample exactly 15 of the 50 observations from the permanent SAS data set mailing:
PROC SURVEYSELECT data = stat482.mailing
out = sample4B
method = URS
seed = 12345
sampsize = 15;
title;
RUN;
PROC PRINT data = sample4B;
title1 'Sample4B: Exact-Sized Unrestricted Random Sample';
title2 'Selects units with equal probabilities & with replacement';
title3 '(using PROC SURVEYSELECT)';
RUN;
Obs | Num | Name | Street | City | State | NumberHits |
---|---|---|---|---|---|---|
1 | 10 | Laura Mills | 704 Hill Street | Bellefonte | PA | 1 |
2 | 14 | William Edwards | 79 Oak Lane | Bellefonte | PA | 1 |
3 | 15 | Harold Harvey | 480 Main Street | Bellefonte | PA | 1 |
4 | 42 | Casey Spears | 123 Main Street | Port Matilda | PA | 1 |
5 | 45 | Ann Draper | 72 Lake Road | Port Matilda | PA | 1 |
6 | 48 | Coach Pierce | 74 Main Street | Port Matilda | PA | 1 |
7 | 17 | Rigna Patel | 101 Beaver Avenue | State College | PA | 2 |
8 | 20 | Kristin Jones | 120 Stratford Drive | State College | PA | 1 |
9 | 29 | Joe White | 678 S. Allen Street | State College | PA | 1 |
10 | 30 | Daniel Peterson | 328 Waupelani Drive | State College | PA | 2 |
11 | 31 | Robert Williams | 156 Straford Drive | State College | PA | 1 |
12 | 34 | Mike Dahlberg | 1201 No. Atherton | State College | PA | 1 |
13 | 37 | Scott Henderson | 245 W. Beaver Avenue | State College | PA | 1 |
Launch and run the SAS program. Then, review the resulting output to convince yourself that the code did indeed select a sample of 15 observations from the mailing data set. Note that the only difference between this code and the previous SURVEYSELECT code is the method = URS statement here replaces the method = SRS statement there. Here, URS tells SAS to use the unrestricted random sampling method to select observations, that is, with equal probability and with replacement. (Oh, yeah, I guess the specified seed differs from the previous code, too, but that's no matter.)
Again, you might want to change the seed (seed) and sample size (sampsize) a few times to see how it affects the sample.