14.3 - The WHERE= option

The WHERE= option allows one to select only those observations from a SAS data set that meet a certain condition. Just as is true for the KEEP= and DROP= options, the WHERE= option can be attached to the SET statement or the DATA statement. If the WHERE= option is attached to the SET statement, SAS selects the observations that meet the condition as it reads in the data. If the WHERE= option is attached to the DATA statement, SAS selects the observations as it writes the data from the program data vector to the output data set.

Example 14.10 Section

The following program illustrates the use of the WHERE= option to select observations from a SAS data set that meet a certain condition. Because the WHERE= option is attached to the DATA statement, the selection process takes place as SAS writes the data from the program data vector to the output data set:

LIBNAME icdb 'C:\yourdrivename\Stat481WC\02datastep\sasndata';
DATA temple (where = (int(subj/10000)=23)) 
		okla (where = (int(subj/10000)=31));
	set icdb.back;
	drop r_id race ethnic relig mar_st ed_level 
		emp_st job_chng income;
RUN;

PROC PRINT data=temple;
	title 'Output Dataset: TEMPLE';
RUN;

PROC PRINT data=okla;
	title 'Output Dataset: OKLA';
RUN;

Output Dataset: TEMPLE

Obs

subj

v_type

v_date

b_date

sex

state

country

1

230003

0

07/09/93

10/25/47

2

22

1

2

230004

0

01/04/94

08/15/23

2

38

1

3

230005

0

01/06/94

05/25/49

2

10

1

4

230006

0

01/06/94

04/24/49

2

21

1

5

230008

0

10/03/96

08/09/60

2

38

1

6

230009

0

10/31/96

11/13/50

1

38

1

 


Output Dataset: OKLA

Obs

subj

v_type

v_date

b_date

sex

state

country

1

310020

0

06/18/93

08/09/43

2

43

1

2

310032

0

08/03/93

04/02/34

2

13

1

3

310037

0

11/08/93

04/25/39

2

36

1

4

310041

0

10/27/93

11/01/26

2

43

1

5

310049

0

02/04/94

10/03/31

2

25

1

6

310055

0

07/12/94

12/11/56

2

36

1

7

310056

0

04/01/94

11/16/61

2

36

1

8

310059

0

06/13/94

08/22/34

2

43

1

9

310065

0

06/30/94

07/27/56

2

36

1

10

310069

0

09/23/94

07/14/42

2

36

1

11

310072

0

09/08/94

12/20/46

2

43

1

12

310073

0

09/26/94

02/27/33

2

36

1

13

310074

0

11/14/94

12/11/68

2

5

1

14

310080

0

12/16/94

03/14/57

2

36

1

15

310082

0

01/17/95

09/15/45

2

4

1

16

310083

0

01/20/95

05/13/54

1

.

17

17

310084

0

01/20/95

09/29/60

2

36

1

18

310085

0

09/01/95

08/08/66

2

16

1

19

310087

0

09/27/95

02/27/29

2

36

1

20

310088

0

09/14/95

07/05/53

2

27

1

21

310089

0

09/14/95

04/07/53

2

36

1

22

310090

0

01/31/96

03/28/52

2

35

1

23

310091

0

01/16/96

10/07/61

2

36

1

24

310092

0

01/05/96

07/12/39

1

36

1

25

310094

0

12/15/95

07/25/60

2

36

1

26

310096

0

12/06/95

05/23/27

2

36

1

27

310097

0

12/18/95

10/13/23

1

36

1

28

310100

0

02/23/96

06/03/27

2

36

1

29

310102

0

02/12/96

05/06/18

2

1

1

30

310103

0

02/12/96

09/16/34

2

4

1

31

310104

0

02/13/96

02/11/44

2

35

1

32

310106

0

02/15/96

11/24/56

2

36

1

33

310107

0

03/05/96

11/10/39

2

13

1

34

310108

0

02/21/96

12/29/45

2

.

4

35

310109

0

05/08/96

04/15/61

2

36

1

36

310110

0

05/14/96

12/23/46

2

5

1

37

310111

0

03/21/96

01/10/62

2

36

1

38

310112

0

03/25/96

07/08/44

2

36

1

39

310113

0

05/22/96

01/12/47

1

36

1

40

310114

0

05/23/96

02/04/38

1

36

1

41

310115

0

05/28/96

05/29/69

2

36

1

42

310117

0

06/06/96

06/15/47

2

5

1

43

310120

0

09/18/96

10/20/69

2

5

1

44

310121

0

07/19/96

10/23/39

2

43

1

45

310122

0

07/12/96

07/15/47

1

43

1

46

310123

0

07/23/96

06/13/45

2

14

1

47

310124

0

07/31/96

04/20/51

2

36

1

48

310125

0

08/13/96

04/13/31

2

4

1

49

310126

0

09/18/96

02/15/68

2

36

1

50

310128

0

10/30/96

10/26/20

2

36

1

51

310130

0

10/16/96

06/16/59

2

31

1

52

310132

0

11/04/96

06/15/70

2

43

1

Well, it's not really that simple. This program also illustrates the use of the WHERE= option to divide a large data set up into smaller data sets, based on the values of a certain variable. As you can see, there are actually two data set names — temple and okla — appearing in the DATA statement. That tells SAS that we want to create two data sets within this one DATA step. One data set, temple, should contain only those subjects enrolled at Temple University, while the other data set, okla, should contain only those subjects enrolled at the University of Oklahoma. Accomplishing that requires a bit of understanding about the subject number used in the ICDB Study, as well as the functionality of the INT function:

  • The variable subj contains a six-digit subject number. The first two digits of the subject number indicate the location where the subject was enrolled. If the first two digits of the subject id number are 23, then the subject was enrolled at Temple University. If the first two digits of the subject id number are 31, the subject was enrolled at the University of Oklahoma.
  • You might recall that the INT function — that is, the integer function— returns the integer part of the expression contained within parentheses. So, if the subject number is 230007, then int(subj/10000) = int(23.0007) = 23, the location id for Temple University.

Pay particular attention to the syntax of the WHERE= option ... it can trip you up if you aren't careful. The entire WHERE= option must be contained in parentheses immediately following the data set to which you want the condition to apply. The condition itself is also placed in parentheses. So, in general, the syntax, when applied to a DATA statement, should look like this:

DATA dsname (WHERE = (condition));

where dsname is the data set name and condition is the condition that you want SAS to evaluate when selecting the observations.

Now, before running the program, change the directory in the LIBNAME statement so that it reflects the location in which you have saved the background data set. After you've edited the LIBNAME statement, run  the SAS program. Review the output to convince yourself that the temporary data set temple contains the observations in the ICDB background data set that correspond to subjects enrolled at Temple University, while the temporary data set okla contains the observations that correspond to subjects enrolled at the University of Oklahoma. Also note that the DROP statement applies to both of the output data sets. The excluded variables are in neither temple nor okla.

A few more comments. When you use the WHERE= option, the condition applies only to the data set which immediately precedes it. Alternatively, you can use a WHERE statement, in which the condition applies to all of the input data sets. (In the next two lessons, we'll learn more about having more than one input data set.) Also, beware that the WHERE= option cannot be used at the same time as the FIRSTOBS= or OBS= options.

Example 14.11 Section

The following program illustrates the efficient use of the WHERE= option in the SET statement. Because the WHERE= option appears in the SET statement, the selection process takes place as SAS reads in the observations from the icdb.back data set:

LIBNAME icdb 'C:\yourdrivename\Stat481WC\02datastep\sasndata';
DATA temple2;
	set icdb.back (where = (int(subj/10000)=23));
	drop r_id race ethnic relig mar_st ed_level emp_st job_chng income;
RUN;

PROC PRINT data = temple2;
	title 'Output Dataset: TEMPLE2';
RUN;

Output Dataset: TEMPLE2

Obs

subj

v_type

v_date

b_date

sex

state

country

1

230003

0

07/09/93

10/25/47

2

22

1

2

230004

0

01/04/94

08/15/23

2

38

1

3

230005

0

01/06/94

05/25/49

2

10

1

4

230006

0

01/06/94

04/24/49

2

21

1

5

230008

0

10/03/96

08/09/60

2

38

1

6

230009

0

10/31/96

11/13/50

1

38

1

Before running the program, change the directory in the LIBNAME statement so that it reflects the location in which you have saved the background data set. After you've edited the LIBNAME statement, run  the SAS program. Review the output from the PRINT procedure. Convince yourself that the temple2 data set contains the observations in the ICDB background data set that correspond to subjects enrolled at Temple University.