1.2 - A SAS Program

Since SAS is a programming language, let's start by looking at a simple SAS program, such as this one:

/****************
This program reads in a set of grades for six students, and prints out their student numbers and genders
******************/

OPTIONS NODATE LS=78;
DATA grade;
    Input subject gender $
        exam1 exam2 hwgrade $;
    DATELINES;
    10 M 80 84 A
     7 . 85 89 A
     4 F 90 .  B
    20 M 82 85 B
    25 F 94 94 A
    14 F 88 84 C
    ;
    RUN;

PROC PRINT data=grade;
    var subject gender; * print student ID and gender;
run;

Don't worry just yet about understanding the "code" for this program. Just trust that this program reads in and prints out a set of grades for six students. The lines between the DATA statement and the first RUN statement tell SAS to read in the grades. And, the lines between the PROC PRINT statement and the second RUN statement tell SAS to print out the student number and gender of each of the six students.

As is true for any other programming language, a SAS program is a series of instructions written in the SAS language that are executed in order. That is, just as you read the words on this page, SAS reads and executes programs from top to bottom and from left to right. And, just as I must adhere to language and grammar rules that allow you to understand what I am saying, you must adhere to a certain set of rules known as "syntax" in order for SAS to be able to read and run your programs properly.

Basic SAS Program Requirements Section

Here are the basic set of requirements every SAS program must follow. As you read through them, you might want to refer back to the above program to see that each of the rules is indeed followed.

Rules for SAS Statements. The basic requirements for SAS statements are:

  • All SAS statements (except those containing data) must end with a semicolon (;). ("DATA grade;" is an example of a SAS statement. "DATALINES;" is another.)
  • SAS statements typically begin with a SAS keyword. (Examples in the above program include OPTIONS, TITLE, DATA, INPUT, DATALINES, RUN, PROC, and VAR.)
  • SAS programs can be freely formatted:
    • Any number of SAS statements can appear on a single line provided they are separated by a semicolon. (The second to last line is such an example.)
    • A SAS statement can be continued from one line to the next as long as no word is split. (The statement beginning with "InPuT ..." is such an example.)
    • SAS statements can begin in any column.
  • SAS statements are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (The statement beginning with "InPuT ..." is such an example.)
  • The words in SAS statements are separated by blanks or special characters (e.g. =, +, or *).
  • Comments may (and should!) be used to annotate your program. Two methods are:
    • A delimited comment begins with a forward slash-asterisk (/*) and ends with an asterisk-forward slash (*/). All text within the delimiters is ignored by SAS. (The first five lines of the program constitute one such comment.)
    • An alternative comment begins with an asterisk (*) and ends with a semicolon (;). All text between the asterisk (*) and the semicolon (;) is ignored by SAS. (The second statement in the second to the last line constitutes such a comment.)

Rules for SAS names. SAS names are used for SAS data set names, variable names, and other such items. An example of a data set name appearing in the above program is grade. Two examples of variable names are subject and exam2. Note that each of the names appearing in the program adheres to the following rules:

  • All names must contain between 1 and 32 characters.
  • The first character appearing in a name must be a letter (A, B, ...Z, a, b, ... z) or an underscore (_). Subsequent characters must be letters, numbers, or underscores. That is, no other characters, such as $, %, or & are permitted. Blanks also cannot appear in SAS names.
  • SAS names are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (SAS is only case sensitive within quotation marks.)

PROC steps and DATA steps. The DATA step and PROC step — that's PROC for "procedure" — are the basic building blocks of any SAS program:

  • Any portion of a SAS program that begins with a DATA statement and ends with a RUN statement, another DATA statement, or a PROC statement is called a DATA step.
  • Any portion of a SAS program that begins with a PROC statement and ends with a RUN statement, a DATA statement, or another PROC statement is called a PROC step.

In general, DATA steps are used to manage data. For example, DATA steps are used to read data into a SAS data set, to modify data values, to check for and correct data errors, and to subset or merge data sets. PROC steps, on the other hand, are pre-written routines that allow us to analyze the data contained in a SAS data set. For example, PROC steps are used to calculate descriptive statistics, to generate summary reports, and to create summary graphs and charts.

In the above program, all of the statements appearing between the "DATA grade;" statement and the first "RUN;" statement make up the one and only one DATA step appearing in the program. And, all of the statements appearing between the "PROC PRINT data = grade;" statement and the second "RUN;" statement make up the one and only one PROC step appearing in the program.

Note! By definition, SAS will execute DATA statements and most PROC statements when another DATA or PROC statement is called in the absence of the RUN statement. It is good programming practice, however, to close all DATA and PROC statements with a RUN statement. Also, note that DATA and PROC statements must be written as distinct operations in your SAS code, that is, you cannot combine a PROC step within a DATA step and vice versa.