As you know, the WHERE statement or option in DATA step or other procedures is very useful in selecting observations from a data set based on some criteria. In PROC SQL, the WHERE clause in the SELECT statement can also be used to subset data based on specified conditions. Any valid SAS expression can be put inside WHERE clause, including functions, comparison or logical operators and even some special operators. Making the good use of it can increase programming efficiency and save computing resources greatly. As always, we will work through this subject with examples.
Example 32.9 Section
The following example uses the WHERE clause to select employees who work at a police department and have the job title as sergeant:
PROC SQL; select Name, Department, Employee_annual_salary from stat482.salary where Department='POLICE' AND Position_title='SERGEANT'; QUIT;
To run this program, please download the SAS data set salary.sas7bdat and save it to your computer.
Reading through the program, you must have known that it selects the name, department and annual salary information from salary data for police sergeants. Note that the columns in the WHERE clause do not have to be specified in the SELECT clause, (such as Position title), which is used in the WHERE clause but not in the SELECT clause. However, for the sake of the results checking, I would suggest to keep these columns in the query until verified.
Launch and run the SAS program, and review the output to convince yourself that the records have been selected as described.
We saw two types of operators used in the above program, the comparison (=) and the logical (and). Besides these common ones, another type that could be very useful in your programming is called a conditional operator. You may know some of them already, like IN, CONTAINS and MISSING. You can find the complete list of operators in the SAS documentation. Next, let’s look at a couple of examples on this using BETWEEN AND and LIKE.
BETWEEN value-1 AND value-2
Both value-1 and value-2 are end values. So you can use the BETWEEN AND operator to specify a range of values, such as from one date to another, or from lower limit to upper limit. The smaller value does not have to be the first.
Example 32.10 Section
The following program uses the operator, BETWEEN AND, to select observations from salary data whose annual salary is between \65,000 and \70,000, and also works in Fire department:
PROC SQL; select Name, Department, Employee_annual_salary label='Salary' format=DOLLAR12.2 from stat482.salary where Employee_annual_salary between 65000 and 70000 and Department='FIRE'; QUIT;
Launch and run the SAS program, and review the query output to convince yourself that the SAS yield the result as expected.
Column LIKE ‘pattern’
With the LIKE operator, you have to specify a column name and the pattern to be matched. Regarding the pattern, first it is case-sensitive and has to be enclosed in quotation marks; secondly, it may contain a special character, either an underscore(_) and/or percent sign(%). The underscore character stands for any single character and the percent sign for any sequence of zero or more characters. For example, assume that you are working with a table containing these values for a column.
Now using different patterns, the selection results are different:
|Kath%||Kathy, Kathie, Katherine|
|_ath%||All of the names above|
Example 32.11 Section
The following program shows the use of the LIKE operator in a WHERE clause to select name, department, position title and annual salary information for people whose name starts with R and the third letter is B:
PROC SQL; select Name, Department, Position_title, Employee_annual_salary label='Salary' format=DOLLAR12.2 from stat482.salary where Name like 'R_B%'; QUIT;
Launch and run the SAS program, and review the query output to convince yourself that the SAS behaves as described.
Another point worthy of being made here is the CALCULATED keywords. In the last section you learned that we can perform calculations in SELECT statement and assign an alias to that new column. However, because SAS processes the WHERE clause prior to the SELECT clause, you will run into a problem if the calculated column is used in a WHERE clause as condition. Therefore, the keyword CALCULATED has to be inserted into the WHERE clause along with the alias to inform SAS that the value is calculated within the query. This point will be illustrated by the following programs.
Example 32.12 Section
The following program attempts to calculate the bonus for every employee, then select ones who has more than $2,000 as bonus:
PROC SQL; select Name, Department, Employee_annual_salary label='Salary' format=DOLLAR12.2, Employee_annual_salary * 0.02 as Bonus from stat482.salary where Bonus >2000 ; QUIT;
Launch and run the SAS program. You may want to see what’s going wrong yourself. In the log window, SAS delivered an error message that the column Bonus cannot be found.
ERROR: The Following columns were not found in the contributing tables: Bonus.
That’s because SAS processes the WHERE clause before the SELECT clause. To make it right, add CALCULATED in the WHERE clause as shown below.
PROC SQL; select Name, Department, Employee_annual_salary label='Salary' format=DOLLAR12.2, Employee_annual_salary * 0.02 as Bonus format=DOLLAR10.2 from stat482.salary where CALCULATED Bonus >2000 ; QUIT;
Now it’s working! Make the same change to your program. Check the output to make sure that SAS processes the data properly.
An alternative to using the keyword CALCULATED is to repeat the calculation expression in the WHERE clause. In the preceding program, the WHERE clause can be rewritten as:
where Employee_annual_salary *0.02 >2000;
But note that this is not an efficient way to do this because SAS has to do the calculation twice.