32.1 - Proc SQL Basics32.1 - Proc SQL Basics
PROC SQL is a procedure that SAS developed for the implementation of Structured Query Language. You can use this procedure to modify, retrieve and report data in tables and views (created on tables). Just as with other SAS procedures, PROC SQL also has basic syntax structures. It takes the following general form:
PROC SQL; SELECT column-1<,…column-n> FROM table-1|view-1<,…table-n|view-n> <WHERE expression> <GROUP BY column-1<,…column-n>> <HAVING expression> <ORDER BY column-1<,…column-n>>; QUIT;
First of all, you may see differences in terminology between SQL and other SAS steps. For example, the data file is called data set in other SAS steps, but table in SQL. Correspondingly, records are called observations in the previous lessons, but rows in SQL tables; and we call a field of data set as variable, but column in this lesson.
|Other SAS steps||SQL Procedure|
Another thing that needs your attention is that, unlike other SAS procedures, there could be one or a few SELECT statements inside PROC SQL. One SELECT statement is called a query, which is composed of many clauses, like SELECT, FROM, WHERE, GROUP BY, HAVING and ORDER BY. The order of these clauses is important. They must appear in the order as shown above.
We will use the whole lesson to work our way through all these keywords in PROC SQL. Let’s start with the most basic one.
The following SAS SQL code is just query that retrieves data from a single table:
libname stat482 "X:\MG\stat480~483\proc sql"; PROC SQL; select ID, SATM, SATV from stat482.survey; QUIT;
To run the program above, you will need to save the SAS data file (survey.sas7bdat) to your computer first. Right-click the link and select the “Save link as…”. A save dialog box will appear and allow you to save the file to the location you choose on your computer. Edit LIBNAME statement to reflect the directory in which you saved the survey data set. Then run the program and check the output.
The SQL procedure in this code represents the most basic form of the procedure. Like other SAS procedures, you need to run PROC SQL at the beginning to invoke it. Inside the procedure, there is only one statement starting with SELECT, which chooses the columns you want. You can list as many columns as needed, separated by commas. Another clause is FROM, which is used to specify the table(s). PROC SQL follows the same protocol of SAS file names. Here we used a two-level name to reference the permanent file. Just as you read the code, this program is used to select three columns (student id, SAT Math score and SAS Verbal score) from the table.
If you use the SAS list output as default, the results of a query will be displayed in the SAS output window. For SAS9.2 and above, the default output will be html. You can also change the open ODS destinations to pdf, rtf or others for the output.
The following SAS program changes the ODS destination from listing to pdf. So the output will be saved under the predefined directory as pdf file.
ods listing close; ods pdf file="X:\MG\stat480~483\proc sql\sat_scores.pdf"; PROC SQL; select ID, SATM, SATV from stat482.survey; QUIT; ods pdf close; ods listing;
The PROC SQL part in the above program is the same as the previous example. The added code is to close the current listing output and open the pdf destination and to save the pdf output generated by SQL procedure that follows to the specified file name. The end part of ODS statements tell SAS to close the pdf destination and open listing again.
In some cases, you may want to create a new table from a query. To do so, you need the CREATE TABLE statement with keyword AS and along with other clauses used in PROC SQL, such as SELECT, or FROM. Below is the basic syntax of CREATE TABLE statement from a query result.
CREATE TABLE table-name AS SELECT column-1<,…column-n> FROM table-1|view-1<,…table-n|view-n> <optional query clauses>;
The following SAS program uses CREATE TABLE statement to create a new table named SAT_scores, which contains student id, SAT math scores and verbal scores.
PROC SQL; CREATE TABLE SAT_Scores as select ID, SATM, SATV from stat482.survey; QUIT;
Launch and run the SAS program. You may notice that there is no output displayed in the SAS output window or any open ODS destination. That’s because the CREATE TABLE statement suppresses the printed output of the query. However, if you check the SAS log window, it shows a message that indicates that the table has been created, and the number of rows and columns in the table.
In this example, table SAT_scores has 226 rows and 3 columns. And the new table’s columns have the same attributes (type, length, format, label) as those of the selected source columns.
From the previous three examples, you now have some idea about what PROC SQL is like to work with. Let’s summarize what makes it so unique from other SAS procedures.
- Unlike other SAS procedures which contain many statements, the SQL procedure may consist of one or more than one SELECT statement. Each SELECT statement contains several clauses, like SELECT, FROM, WHERE, ORDER BY. But the SELECT and FROM clause are essential and indispensable. Other clauses are optional. All clauses have to be written in the order as listed in the syntax. For each one SELECT statement, only one semicolon is needed at the end of the statement.
- No RUN statement is required for PROC SQL to execute. SQL procedure will continue to run after you submit the program. To end it, you have to submit another PROC step, a DATA step, or a QUIT statement.