32.3 - Using the WHERE Clause

As you know, the WHERE statement or option in the DATA step or other procedures is very useful in selecting observations from a data set based on some criteria. In PROC SQL, the WHERE clause in the SELECT statement can also be used to subset data based on specified conditions. Any valid SAS expression can be put inside the WHERE clause, including functions, comparison or logical operators, and even some special operators. Making good use of it can increase programming efficiency and save computing resources greatly. As always, we will work through this subject with examples.

Example 32.9 Section

The following example uses the WHERE clause to select employees who work at a police department and have the job title of sergeant:

PROC SQL;
	select Name,
		Department,
		Employee_annual_salary 
	from stat482.salary
	where Department='POLICE' AND Position_title='SERGEANT';
QUIT;

Name Department Employee Annual Salary
PENDARVIS POLICE 103590
ODUM POLICE 106920
DANIELS POLICE 106920
TATE JR POLICE 103590
GOLDSMITH POLICE 110370
FRANKO POLICE 106920
RADDATZ POLICE 110370
FLEMING POLICE 100440
CASEY POLICE 110370
MC COY POLICE 110370
JACOBS POLICE 106920

To run this program, please download the SAS data set salary.sas7bdat and save it to your computer.

Reading through the program, you must have known that it selects the name, department, and annual salary information from salary data for police sergeants. Note that the columns in the WHERE clause do not have to be specified in the SELECT clause, (such as Position title), which is used in the WHERE clause but not in the SELECT clause. However, for the sake of the results checking, I would suggest keeping these columns in the query until verified.

Launch and run the SAS program, and review the output to convince yourself that the records have been selected as described.

We saw two types of operators used in the above program, the comparison (=) and the logical (and). Besides these common ones, another type that could be very useful in your programming is called a conditional operator. You may know some of them already, like IN, CONTAINS, and MISSING. You can find the complete list of operators in the SAS documentation. Next, let’s look at a couple of examples of this using BETWEEN AND and LIKE.

BETWEEN value-1 AND value-2

Both value-1 and value-2 are end values. So you can use the BETWEEN AND operator to specify a range of values, such as from one date to another, or from lower limit to upper limit. The smaller value does not have to be the first.

Example 32.10 Section

The following program uses the operator, BETWEEN AND, to select observations from salary data whose annual salary is between \65,000 and \70,000, and also works in the Fire department:

PROC SQL;
	select Name,
		Department,
		Employee_annual_salary label='Salary' format=DOLLAR12.2
	from stat482.salary
	where  Employee_annual_salary between 65000 and 70000
		and Department='FIRE';
QUIT;

Name Department Salary
KELLY FIRE $65,946.00
KOCHANEY FIRE $65,946.00

Launch and run the SAS program, and review the query output to convince yourself that the SAS yield the result as expected.

Column LIKE ‘pattern’

With the LIKE operator, you have to specify a column name and the pattern to be matched. Regarding the pattern, first, it is case-sensitive and has to be enclosed in quotation marks; secondly, it may contain a special character, either an underscore(_) and/or a percent sign(%). The underscore character stands for any single character and the percent sign for any sequence of zero or more characters. For example, assume that you are working with a table containing these values for a column.

  • Cathy
  • Kathy
  • Kathie
  • Katherine

Now using different patterns, the selection results are different:

Patterns  Results
Kath_ Kathy
Kath__ Kathie
Kath% Kathy, Kathie, Katherine
_ath% All of the names above

Example 32.11 Section

The following program shows the use of the LIKE operator in a WHERE clause to select name, department, position title and annual salary information for people whose name starts with R and the third letter is B:

PROC SQL;
	select Name,
		Department,
		Position_title,
		Employee_annual_salary label='Salary' format=DOLLAR12.2
	from stat482.salary
	where  Name like 'R_B%';
QUIT;

Name Department Position Title Salary
ROBINSON WATER MGMNT OPERATING ENGINEER-GROUP C $93,745.60
RABANALES FINANCE AUDITOR II $87,912.00
ROBERTS FIRE PARAMEDIC I/C $79,404.00
ROBINSON WATER MGMNT CONSTRUCTION LABORER $79,040.00

Launch and run the SAS program, and review the query output to convince yourself that the SAS behaves as described.

Another point worthy of being made here is the CALCULATED keywords. In the last section, you learned that we can perform calculations in the SELECT statement and assign an alias to that new column. However, because SAS processes the WHERE clause prior to the SELECT clause, you will run into a problem if the calculated column is used in a WHERE clause as a condition. Therefore, the keyword CALCULATED has to be inserted into the WHERE clause along with the alias to inform SAS that the value is calculated within the query. This point will be illustrated by the following programs.

Example 32.12 Section

The following program attempts to calculate the bonus for every employee, then select ones who have more than $2,000 as a bonus:

PROC SQL;
	select Name,
		Department,
		Employee_annual_salary label='Salary' format=DOLLAR12.2,
		Employee_annual_salary * 0.02 as Bonus
	from stat482.salary
	where  Bonus >2000 ;
QUIT;

Launch and run the SAS program. You may want to see what’s going wrong yourself. In the log window, SAS delivered an error message that the column Bonus cannot be found.

ERROR: The Following columns were not found in the contributing tables: Bonus.

That’s because SAS processes the WHERE clause before the SELECT clause. To make it right, add CALCULATED in the WHERE clause as shown below.

PROC SQL;
	select Name,
		Department,
		Employee_annual_salary label='Salary' format=DOLLAR12.2,
		Employee_annual_salary * 0.02 as Bonus format=DOLLAR10.2
	from stat482.salary
	where  CALCULATED Bonus >2000 ;
QUIT;

Name Department Salary Bonus
PENDARVIS POLICE $103,590.00 $2,071.80
SAWYER CITY COUNCIL $117,333.00 $2,346.66
ODUM POLICE $106,920.00 $2,138.40
PLANTZ GENERAL SERVICES $102,060.00 $2,041.20
FORD FIRE $100,440.00 $2,008.80
DANIELS POLICE $106,920.00 $2,138.40
MALONEY FIRE $127,566.00 $2,551.32
MOLLOY COMMUNITY DEVELOPMENT $102,060.00 $2,041.20
NIEGO FIRE $143,682.00 $2,873.64
PRICE FIRE $103,590.00 $2,071.80
TATE JR POLICE $103,590.00 $2,071.80
JIMENEZ PROCUREMENT $113,448.00 $2,268.96
CEBALLOS WATER MGMNT $113,448.00 $2,268.96
FERMAN FIRE $131,466.00 $2,629.32
SHUM TRANSPORTN $104,736.00 $2,094.72
FUNK FIRE $101,688.00 $2,033.76
WRZESINSKI FIRE $105,918.00 $2,118.36
GOLDSMITH POLICE $110,370.00 $2,207.40
FRANKO POLICE $106,920.00 $2,138.40
RADDATZ POLICE $110,370.00 $2,207.40
MC NABB FIRE $105,918.00 $2,118.36
FLEMING POLICE $100,440.00 $2,008.80
CASEY POLICE $110,370.00 $2,207.40
MACELLAIO JR WATER MGMNT $101,920.00 $2,038.40
DARLING LAW $149,160.00 $2,983.20
NASH FINANCE $103,740.00 $2,074.80
MC COY POLICE $110,370.00 $2,207.40
HOLDER WATER MGMNT $108,534.40 $2,170.69
TAYLOR FIRE $108,462.00 $2,169.24
PERFETTI POLICE $112,068.00 $2,241.36
JACOBS POLICE $106,920.00 $2,138.40
HENRY AVIATION $108,534.40 $2,170.69
IRELAND FIRE $113,400.00 $2,268.00

Now it’s working! Make the same change to your program. Check the output to make sure that SAS processes the data properly.

An alternative to using the keyword CALCULATED is to repeat the calculation expression in the WHERE clause. In the preceding program, the WHERE clause can be rewritten as:

where Employee_annual_salary *0.02 >2000;

But note that this is not an efficient way to do this because SAS has to do the calculation twice.