32.6 - Using the HAVING Clause32.6 - Using the HAVING Clause
Previously we learned how to use the GROUP BY clause to group and summarize data. Sometimes, we want to select certain groups from the result. That’s when the HAVING clause comes into play.
This section we are going to use the salary.sas7bdat data for all examples. Download this SAS data file onto your own computer and change the
libname to be the directory where you save the data.
The following program calculates the average salary for each department, then select three departments as needed in the query output:
PROC SQL; select Department, avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2 from stat482.salary group by Department having Department in ('LAW','FINANCE','FIRE') order by Avg_salary; QUIT;
Let’s review the program first. The code selects the column Department and uses the summary function AVG() to compute the average salaries. Since the GROUP BY clause also is also present in the SELECT statement, the averages are for each department. The user is only interested in three departments, Law, Finance and Fire. So we use the HAVING clause to select only these three to be output. Finally, we ask SAS to sort the data by average salaries. This program contains every clause we have learned so far except the WHERE clause, which we will address later.
Launch and run the SAS program and review the output to make sure you understand the output.
You may wonder if WHERE can do the same thing as HAVING does in the above program. You can try replacing Having with WHERE clause as following. You will get identical output as before.
PROC SQL; select Department, avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2 from stat482.salary where Department in ('LAW','FINANCE','FIRE') group by Department order by Avg_salary; QUIT;
However, let’s not assume that WHERE and HAVING are the same based on this. There are some big differences between them. Generally speaking, HAVING has control on grouped data during output; WHERE controls input data row by row. Let’s see more examples about these two commands.
The following program calculates the average salary for each department and choose ones having more than $70,000:
PROC SQL; select Department, avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2 from stat482.salary group by Department having Avg_salary > 70000 order by Avg_salary; QUIT;
Only a small change has been made to this program. The condition in the HAVING clause changed the department average salary more than $70,000. So, the expression used in the HAVING statement is a summary function. And, the data is sorted by average values.
Launch and run the SAS program and review the output. As we expect, all departments having more than $70,000 average salary are listed as the query result.
Next, let’s try using WHERE to perform the same task.
PROC SQL; select Department, avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2 from stat482.salary where calculated Avg_salary > 70000 group by Department order by Avg_salary; QUIT;
You must remember that to use the computed result in the WHERE clause, the keyword “CALCULATED” should be inserted. Oops! SAS gives us an error message like this:
ERROR: Summary functions are restricted to the SELECT and HAVING clauses only.
This example illustrates a big difference between HAVING and WHERE. The summary functions can be used in a HAVING clause but not in a WHERE clause, because HAVING works on grouped data, but WHERE evaluates existing or calculated data row by row.
Based on our current experiences with these two clauses, you might prefer to use HAVING since it can be used for both situations. However, don’t rush to this conclusion either. You will find out more in the next example.
The following two SAS program are similar. The only difference is that the first program uses a WHERE clause and the second program uses a HAVING clause. They try to accomplish the same task: count how many employees at each position inside Police Department:
PROC SQL; select Position_Title, count(*) as Employees from stat482.salary where Department='POLICE' group by Position_Title; QUIT; PROC SQL; select Position_Title, count(*) as Employees from stat482.salary group by Position_Title having Department='POLICE'; QUIT;
Now, Launch and run both programs. The output on the top is from the program using WHERE clause; the output on the bottom is the partial output from the program using HAVING clause.
You might be surprised to see how different these two results are. One would expect a result like the output on the top. But the output on the bottom has so many more rows, and even some numbers do not match! Let’s review the code to understand what happened. There are two columns in the SELECT clause, Position_Title and a summary function, count(*), which counts total number of rows for each position group since we specify Position_Title in the GROUP BY clause. Unlike the programs in the previous example, the expression used inside WHERE and HAVING references another column, Department, which is not in the SELECT clause. Therefore, SAS handles them differently in the two programs.
The first program uses the WHERE clause. Since SAS processes the WHERE clause before SELECT and on a row-by-row basis, the records from Police department are selected from the data first. Then SAS counts the number of employees under each position title inside the department. For example, there is only one person who is a “CLINICAL THERAPIST III” in the Police Department. So the count is 1. We obtained the desired output.
On the other hand, the second program uses the HAVING clause. It is equivalent to the following program but without Department column in the output:
PROC SQL; select Position Title, Department, count(*) as employees from stat482.salary group by Position_Title having Department='POLICE'; QUIT;
In this program, SAS counts employee numbers on each position across all departments because of GROUP BY clause. For example, there is each one person titled “CLINICAL THERAPIST III” in POLICE department and HEALTH department. So the total count on this position is 2. Since there is an extra column in SELECT clause besides the summary function and a GROUP BY column, all rows are in the output with counts on each job position. For instance, under position title “CLINICAL THERAPIST III”, both records have 2 as value of “Employees”. At last, SAS evaluates the condition (Department=POLICE) in HAVING clause to select rows for the output. That’s why you see Employees=2 for position title “CLINICAL THERAPIST III” in the output from the second query.
We have seen two examples that show the differences between HAVING and WHERE so far. Since SAS handles them so differently, when it comes to WHERE or HAVING, pick one that fits your needs the best.
Last but not the least, let’s check out one more cool feature of HAVING clause.
The following program selects the departments whose average salary is lower than the overall salary level:
PROC SQL; select Department, avg(Employee_annual_salary) as Avg_salary format=DOLLAR12.2 from stat482.salary group by Department having Avg_salary < (select avg(Employee_annual_salary) from stat482.salary) order by Avg_salary; QUIT;
Going through this program, you may not find anything unusual until HAVING clause. Inside the clause it’s not a standard expression as before, but a query:
(select avg(Employee_annual_salary) from stat482.salary)
Such kind of query is called subquery, inner query or nested query. You can use this query-expression in a HAVING or WHERE clause. The subquery used in this example is to calculate the overall average salary. The result is compared with average salaries of each department. Then SAS evaluates the condition “Less than” in HAVING clause to select departments who have less average salaries to output.
Launch and run the SAS program, and review the query result. Convince yourself that the departments’ information has been selected as described.