32.2 - Using the SELECT Clause32.2 - Using the SELECT Clause
In the previous section, we learned the basics about PROC SQL. Next, we will investigate more details about the SELECT statement and how to use it to retrieve data, create new columns and what options are available for data manipulation.
The following SAS program creates a new temporary table with all columns retrieved from permanent file traffic.sas7bdat:
PROC SQL; CREATE TABLE traffic as select * from stat482.traffic; QUIT; PROC CONTENTS data=traffic VARNUM; RUN; PROC CONTENTS data=stat482.traffic VARNUM; RUN;
First, you need to download the permanent SAS data file traffic to your own computer. Revise the
libname statement as needed. Then run the program.
One thing you need to know about this program is the shortcut, noted with an asterisk (*) after SELECT. The asterisk refers to all columns in the original table. So, this code is to select all columns in the permanent file into the temporary file, traffic.
To check the data, you may use the other procedures we learned in previous lessons, such as the PRINT procedure. In the above program, PROC CONTENTS has been used to check the variable attributes in the original and the new table. As we mentioned in the previous section, the variables chosen from other table(s) keep the same attributes.
Besides selecting original columns, the SELECT clause can also be used to create new columns, just as we used assignment statements in DATA step to create new variables.
The following program is to create new columns with the SELECT statement:
PROC SQL; select id, count_location, scan(count_location,-1,' ') as orientation, street, passing_vehicle_volume * 0.5 as weekends_traffic_volume from traffic; QUIT;
As you can see, this code uses the traffic table we created previously. Using the SELECT statement you can create new columns that contain either characters or numbers. With valid statements within the SELECT clause, you can use any expression for new columns. And, the new columns can be named by using the keyword AS followed by the names you would like to use. (Column names also follow the rules for SAS names.) In the above code, the first new column is created by a character function
scan(), which substring is the orientation information from the existing column, count_location. The name for this new column is
orientation after AS. (It may make no sense, just for the use of example.) The second new column is a math expression that estimates the traffic volume during weekends by multiplying daily vehicle volume by 0.5. Its alias is
Launch and run the SAS program, and review the output to convince yourself that SAS does indeed create two new columns as you expect. But you should note that new columns only exist during the query, unless you created a table out of it.
While observing the data in traffic, you may notice that some data are not formatted as you want. Fortunately, SAS provides many options in SELECT statement so you can enhance the appearance of the query output.
The following program adds the format to dates, labels columns and add titles to the output:
PROC SQL; TITLE "Traffic volume in Area SS"; TITLE2 "During weekdays and weekends"; select id, Date_of_count label='Date of Count' format=mmddyy10., count_location label='Location', street, passing_vehicle_volume label='Daily Volume' format=comma6., passing_vehicle_volume * 0.5 as weekends_traffic_volume label='Weekends Volume' format=comma6. from traffic; QUIT;
Launch and run the SAS program and then review the resulting output to convince yourself that the data has been formatted and labeled as you expect. Except for titles, you can also add a footnote to the output using footnote statement. But unlike using title and footnote statements with other SAS steps, both statements have to be placed either before the PROC SQL statement, or between the PROC SQL statement and SELECT statement.
One more thing we will talk about in this section is the CASE operator, which just follows the SELECT clause to create new columns conditionally. You must remember that this applies only to IF-THEN-ELSE statements that are available in DATA step. In PROC SQL, the CASE operator can perform the equivalent functions. First, let’s look at the syntax for the CASE construct.
CASE WHEN when-condition THEN result-expression <… WHEN when-condition THEN result-expression> <ELSE result-expression> END AS < column name>
As in IF-THEN statements, you can add as many WHEN conditions as you want. The conditions can be any valid SAS expression, including calculations, functions, and logical operators. It works as IF-THEN statements, too. If the conditions have been met, SAS will carry out the corresponding actions following the keyword THEN. If the WHEN condition is false, then PROC SQL executes the ELSE expression. You can create a new column and name it with AS keywords after END. The ELSE and AS keywords are optional. But it’s good practice to keep original columns while creating new ones.
The following SAS program uses CASE operator to assign different salary raise plans for each salary range:
PROC SQL ; select Name, Department, employee_annual_salary label='salary' format=dollar12.2, 'next year raise:', case when employee_annual_salary=. then . when employee_annual_salary < 85000 then 0.05 when 85000 <= employee_annual_salary < 125000 then 0.03 when employee_annual_salary >=125000 then 0.01 else 0 end as raise format=percent8. from stat482.salary; QUIT;
You already know format and label options from the previous explanations. There are a couple of new things in this example, however. First, you can insert a character(or numeric) constant as a new column in the table. Here a character string “next year raise” has been added between salary and raise. Raise is also a new column which has been created by the CASE operator based on the current annual salary of each person.
Download the SAS data set salary.sas7bdat on your computer and revise the
libname statement to reflect the directory where you save the file. Then launch and run the program. Review the query result to convince yourself that the raise values have been assigned correctly.
The CASE operator has two forms of syntax. In fact, if you use only one column for WHEN condition(s), this column’s name can be put after CASE and before WHEN. So you don’t have to repeat the column’s name in each WHEN condition. Below is the syntax for this form:
CASE <column-name> when-condition THEN result-expression <… WHEN when-condition THEN result-expression> <ELSE result-expression> END AS < column name>
The following program uses the simpler form of CASE construct to decide compensation (Yes or N/A) based on departments:
PROC SQL outobs=20; select Name, Department, employee_annual_salary label='salary' format=dollar12.2, case department when 'POLICE' then 'Yes' when 'FIRE' then 'Yes' else 'N/A' end as Compensation from stat482.salary; QUIT;
The above code uses the same data set as the previous example, salary. It assigns the different compensation plans based on which department people work for and creates a new column, Compensation, for the result. This time, the column name Department has been put outside the WHEN conditions and into CASE operator. So we don’t need coding like “WHEN department=’POLICE’” any more.
Another feature is the option you can use in the PROC SQL statement, OUTOBS=n. It can be used to limit the number of rows displayed in the output. So in this case, we would expect the table in the output window shows the first 20 rows of the data. And such a warning message will be delivered in the log file.
WARNING: Statement terminated early due to OUTBOS=20 option.
Note that OUTOBS= will also affect tables that are created by the CREATE TABLE statement.
Launch and run the program. Then check the query result to make sure the records have been processed as expected. Note that you have to be cautious with this simpler form. For instance, if you move Employee_annual_salary out of the WHEN conditions in the program of Example7, SAS will report an error and not execute!