Lesson 1: Getting Started in SASLesson 1: Getting Started in SAS
Let's get started! This lesson is an overview of what the SAS statistical software application is all about. The purpose here is to get you acquainted with how the application works and is organized so that you can begin writing programs to manage and analyze the data that you have collected.
Keep your eye out for important syntax rules for what the SAS application expects or needs from the user in order to perform the tasks that you want it to do.
- know the basic ("syntax") rules for all SAS statements
- know the basic ("syntax") rules for all SAS names
- know the basic structure of a DATA step and a PROC step
- identify the type of information contained in the descriptor portion of a SAS data set
- identify the observations and variables contained in a SAS data set
- know the six possible attributes of a variable contained in a SAS data set
- distinguish between numeric and character variables
- identify the five different SAS windows and explain their function
- execute (or "run") a program in SAS
- know the guidelines for good formatting and commenting of computer programs (that you are expected to follow throughout this course)
- use the SAS Editor Window to modify a simple SAS program
1.1 - What is the SAS System?1.1 - What is the SAS System?
SAS is a fourth-generation programming language (or 4GL). According to Wikipedia, a fourth-generation programming language is "a programming language designed with a specific purpose in mind such as the development of commercial business software." All 4GLs are designed to reduce programming effort and minimize the time and cost it takes to develop software. That's just one of the benefits you will gain by learning SAS!
What all this really means is that the SAS System is an integrated system of modular software products. It enables you to:
- enter, retrieve and manage your data easily
- create slick reports and pretty pictures
- analyze your data statistically and mathematically
- plan, forecast and make decisions concerning your business
- manage your projects and perform research on how you conduct your operations
- improve the quality of your processes, as well as
- develop entirely new software applications.
In addition, you can also use SAS for many large-scale functions, such as data warehousing, data mining, human resources management, decision support, and financial management.
Originally, the acronym "SAS" stood for "Statistical Analysis System." Because the system has now grown to be so diverse and can do so much more than perform statistical analyses, the SAS Institute now no longer treats "SAS" as an acronym, but rather the software's name.
1.2 - A SAS Program1.2 - A SAS Program
Since SAS is a programming language, let's start by looking at a simple SAS program, such as this one:
/**************** This program reads in a set of grades for six students, and prints out their student numbers and genders ******************/ OPTIONS NODATE LS=78; DATA grade; Input subject gender $ exam1 exam2 hwgrade $; DATELINES; 10 M 80 84 A 7 . 85 89 A 4 F 90 . B 20 M 82 85 B 25 F 94 94 A 14 F 88 84 C ; RUN; PROC PRINT data=grade; var subject gender; * print student ID and gender; run;
Don't worry just yet about understanding the "code" for this program. Just trust that this program reads in and prints out a set of grades for six students. The lines between the DATA statement and the first RUN statement tell SAS to read in the grades. And, the lines between the PROC PRINT statement and the second RUN statement tell SAS to print out the student number and gender of each of the six students.
As is true for any other programming language, a SAS program is a series of instructions written in the SAS language that are executed in order. That is, just as you read the words on this page, SAS reads and executes programs from top to bottom and from left to right. And, just as I must adhere to language and grammar rules that allow you to understand what I am saying, you must adhere to a certain set of rules known as "syntax" in order for SAS to be able to read and run your programs properly.
Basic SAS Program Requirements
Here are the basic set of requirements every SAS program must follow. As you read through them, you might want to refer back to the above program to see that each of the rules is indeed followed.
Rules for SAS Statements. The basic requirements for SAS statements are:
- All SAS statements (except those containing data) must end with a semicolon (;). ("DATA grade;" is an example of a SAS statement. "DATALINES;" is another.)
- SAS statements typically begin with a SAS keyword. (Examples in the above program include OPTIONS, TITLE, DATA, INPUT, DATALINES, RUN, PROC, and VAR.)
- SAS programs can be freely formatted:
- Any number of SAS statements can appear on a single line provided they are separated by a semicolon. (The second to last line is such an example.)
- A SAS statement can be continued from one line to the next as long as no word is split. (The statement beginning with "InPuT ..." is such an example.)
- SAS statements can begin in any column.
- SAS statements are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (The statement beginning with "InPuT ..." is such an example.)
- The words in SAS statements are separated by blanks or special characters (e.g. =, +, or *).
- Comments may (and should!) be used to annotate your program. Two methods are:
- A delimited comment begins with a forward slash-asterisk (/*) and ends with an asterisk-forward slash (*/). All text within the delimiters is ignored by SAS. (The first five lines of the program constitute one such comment.)
- An alternative comment begins with an asterisk (*) and ends with a semicolon (;). All text between the asterisk (*) and the semicolon (;) is ignored by SAS. (The second statement in the second to the last line constitutes such a comment.)
Rules for SAS names. SAS names are used for SAS data set names, variable names, and other such items. An example of a data set name appearing in the above program is grade. Two examples of variable names are subject and exam2. Note that each of the names appearing in the program adheres to the following rules:
- All names must contain between 1 and 32 characters.
- The first character appearing in a name must be a letter (A, B, ...Z, a, b, ... z) or an underscore (_). Subsequent characters must be letters, numbers, or underscores. That is, no other characters, such as $, %, or & are permitted. Blanks also cannot appear in SAS names.
- SAS names are not case sensitive, that is, they can be entered in lowercase, uppercase, or a mixture of the two. (SAS is only case sensitive within quotation marks.)
PROC steps and DATA steps. The DATA step and PROC step — that's PROC for "procedure" — are the basic building blocks of any SAS program:
- Any portion of a SAS program that begins with a DATA statement and ends with a RUN statement, another DATA statement, or a PROC statement is called a DATA step.
- Any portion of a SAS program that begins with a PROC statement and ends with a RUN statement, a DATA statement, or another PROC statement is called a PROC step.
In general, DATA steps are used to manage data. For example, DATA steps are used to read data into a SAS data set, to modify data values, to check for and correct data errors, and to subset or merge data sets. PROC steps, on the other hand, are pre-written routines that allow us to analyze the data contained in a SAS data set. For example, PROC steps are used to calculate descriptive statistics, to generate summary reports, and to create summary graphs and charts.
In the above program, all of the statements appearing between the "DATA grade;" statement and the first "RUN;" statement make up the one and only one DATA step appearing in the program. And, all of the statements appearing between the "PROC PRINT data = grade;" statement and the second "RUN;" statement make up the one and only one PROC step appearing in the program.
1.3 - SAS Data Sets1.3 - SAS Data Sets
In order to be able to analyze our data, we need to be able to read it into a data set that our SAS software understands. A SAS data set is a file containing two parts: a descriptor portion and a data portion.
The descriptor portion of a SAS data set contains the vital statistics of the data set, such as the name of the data set, the date and time that the data set was created, the number of observations and the number of variables. The following table shows one part of the descriptor portion of a data set called work.grade:
|Data Set Name||WORK.GRADE||Observations||6|
|Created||Monday, August 18, 2008 07:25:39 PM||Observation Length||40|
|Last Modified||Monday, August 18, 2008 07:25:39 PM||Deleted Observations||0|
|Data Set Type||Sorted||NO|
|Encoding||wlatin1 Western (Windows)|
The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table, such as this:
In this example, the number 53 is a data value, the name Susie is a data value, and so on. Just as is true for data sets in other statistical packages, a SAS data set is comprised of variables and observations. The variables (or columns) are collections of data values that describe a particular characteristic of the thing being measured. A SAS data set can store thousands of variables. Our data set here contains just four variables — the id, name, height, and weight of the person being measured. The observations (or rows) are collections of data values that typically relate to one particular object (such as a person named Susie). The values 53, Susie, 65", and 120 constitute a single observation in the above data set. A SAS data set can store any number of observations. Our data set here contains just five observations.
1.4 - SAS Variables1.4 - SAS Variables
One thing you might have noticed about our data set:
is that there are two different types of variables — there are three numeric variables (ID, height, and weight) and one character variable (name). The type of variable is just one of six variable attributes that SAS stores in the descriptor portion of every SAS data set. The six attributes that SAS stores are:
- the variable's name
- the variable's type
- the variable's length
- the variable's format (if any)
- the variable's informat (if any)
- and the variable's label (if any)
As suggested by the presence of the ("if any") phrase in the above list, an informat, format, and label do not exist for every variable. A name, type, and length, on the other hand, do exist for every variable. The following is a partial listing of what might be the attribute information in the descriptor portion of our SAS data set:
Alphabetic List of variables and Attributes
Let's investigate each of the six attributes briefly.
Variable names. There's not much more to say about a variable's name that hasn't already been said on the SAS basics page. That is, as long as your variable names conform to SAS naming conventions, you're good to go. In case you haven't memorized it yet, variable names must be between 1 and 32 characters long, must begin with either an uppercase letter, lowercase letter or an underscore (_), and thereafter can contain any combination of numbers, letters, or underscores.
Variable types. As mentioned earlier, a variable is either identified as being character or numeric. Character variables, such as name, can contain any character that you can make with your keyboard (letters, numbers, !@#$%^&( )_+, ... you get the idea). Numeric variables, on the other hand, such as id, height, and weight, can contain only numeric values — namely, the digits 0 through 9, a positive sign (+), a negative sign (-), a decimal place (.), and the capital letter E for scientific notation.
Another thing you might have noticed about our data set above is that some information in the data set is missing — the name of the person whose ID is 55 is missing, and the weight of Dennis is missing. That's okay — SAS can handle missing data. What SAS displays when a value is missing depends on the variable's type. As suggested by the above data set, SAS displays a blank space for a missing character value and a period (.) for a missing numeric value.
Variable length. A variable's length tells us how many bytes are used to store the variable in your computer's memory. Character variables can be up to 32,767 bytes long. In our data set, the variable name has a length of 7 characters and therefore uses 7 bytes of storage. All numeric variables have a default length of 8. Numeric values (no matter how many digits they contain) are stored as floating-point numbers in 8 bytes of storage, unless you specify a different length. It shouldn't be surprising therefore that the length of each of the numeric variables in our data set is 8.
Variable formats and informats. We'll learn about these two attributes in greater detail later in the course. For now, just know that a variable's format tells SAS how you'd like your variable's values displayed in reports. For example, you might want to tell SAS to display the value 5391 as $5391.00 or maybe 5,391 instead. Whereas formats tell SAS how to write a variable's values, informats tell SAS how to read data values having a special form into standard SAS values.
Variable labels. If you want, you can give your variables descriptive labels up to 256 characters long. By default, many of the standard reports in SAS identify variables by their names. You can instead tell SAS to display more descriptive information about the variable by assigning a label to the variable. In our data set above, the variables height and weight were given the more descriptive labels of "Height (inches)" and "Weight (pounds)", respectively.
That's about all you need to know about SAS variables for now! Let's now delve into how to interact with the SAS Windowing Environment, so we can first write and then run some SAS programs.
1.5 - SAS Windowing Environment1.5 - SAS Windowing Environment
Okay, so you want to write and run a SAS program in the SAS System. How do you go about doing that? The first thing you need to do is to open your SAS software by selecting it in your programs list, or by double-clicking on the icon on your desktop, which might look something like:
When your SAS software opens, you should immediately see something like this (that is, without the red labels, which have been added here merely for explanatory reasons):
See if you can identify on your screen the following five windows: the Editor Window, the Log Window, the Output Window, the Explorer Window, and the Results Window. You may have to click on the appropriate tabs along the bottom in order to activate each of the five windows. Note that a window is activated when its top blue bar appears brighter, and the window is deactivated when its top blue bar appears dimmer.
Editor Window. The Editor Window is where you type in your SAS programs. It allows you to perform standard editing tasks, such as entering, editing and submitting programs. In the Editor window, you can also open previously saved SAS programs, as well as save new SAS programs. After you have typed in your SAS program, and are satisfied that it meets all of the syntax requirements discussed above, you can "run" (or "execute") your program by clicking on the "running man" icon . For Windows operating systems, the default editor is called the Enhanced Editor, because it gives you a nudge — through the use of differently colored text — if your program contains a syntax error. The Enhanced Editor also allows you to collapse and expand the various steps in your program.
Log Window. The Log Window displays messages about your SAS session and any programs that you submit. You should always plan on checking this window after running a program. Even though your program may appear to have run correctly, critical errors may still have occurred when reading or manipulating the data. SAS uses the following color-coded system to assist you in reading the log:
- the DATA and PROC steps that appear in your program are printed in black
- notes that SAS wants to report to you are printed in blue
- warnings that SAS wants to draw to your attention are printed in green
- and errors that cause SAS to abort running your program are printed in red
Output Window. The Output Window is where the printable results from your program appear. It is positioned behind the Log and Editor Windows until there is output to display, when it automatically opens or moves to the front of your display. Examples of output that your programs might create include data listings, table summaries, charts, and character-based plots and graphs. If you review the Output Window after running a program and some output that you expected is missing, then double-check the Log Window to see if you had any programming errors that prevented SAS from executing your commands. Note, though, that not all SAS programs create output in the Output Window. If you create HTML output, for example, it can be viewed in the internal SAS browser called the Results Viewer Window. And, if you create a graph, it will appear in a separate Graph Window.
Note that if your program is creating output in the Results Viewer Window, by default, you will want to turn this feature off. Currently SAS 9.1 and 9.2 default to the Output Window, while SAS 9.3 and 9.4 default to the Results Viewer Window. Make this change under preferences (found in the tools menu, under options). In the Preferences Window, select the Results Tab. You will want to uncheck Create HTML and, instead, select Create Listing. This is important, because when SAS produces HTML it takes control of how the output looks out of your hands. One of the goals of the course is to be able to control the way your output is presented and if SAS is creating HTML results, you will not be able to do this.
Explorer Window. The Explorer Window allows you to easily view and manage your SAS files, which are stored in SAS data libraries. We'll learn more about data libraries later. For now, it suffices to know that a library name is just a nickname for the actual location — that is, a folder on your computer — of your SAS files. The Explorer Window can be used to create new SAS libraries and files, to open SAS files, and to move, copy and delete SAS files.
Results Window. (Note that the Results Window is not the same as the Results Viewer Window described previously.) The Results Window serves much like a table of contents for your Output Window. That is, it itemizes each section of your Output Window in outline form so you can easily jump from one piece of output to another. The Results Window is empty until you submit a SAS program that creates output. Then, it moves to the front of your display.
So now you've seen a basic SAS program as well as have been introduced to how to run a program in the SAS System. Let's try it out!
The following SAS program illustrates the simplest example of column input.
Copy the codebelow:
/******************** This program reads in a set of grades for six students, and prints out their student numbers and genders. *****************/ OPTIONS NODATE LS=78; TITLE "Example: getting started with SAS"; DATA grade; InPuT subject gender $ exam1 exam2 hwgrade $; DATALINES; 10 M 80 84 A 7 . 85 89 A 4 F 90 . B 20 M 82 85 B 25 F 94 94 A 14 F 88 84 C ; RUN; PROC PRINT data=grade; var subject gender; *print student ID and gender; RUN;
At this point, don't worry too much about the details, but you can review what each statement in the program means by clicking on the Inspect the code! button. After you've done that, either downloador copy the code into SAS. Then:
- Run the program by clicking on the "running man" icon .
- After the program has run, note that as promised the Output Window moves to the front of the other open SAS windows.
- Click on the Log Window tab on the bottom of your screen so that the window becomes accessible to you. Review the contents of the window to see the messages that SAS displays. You might want to maximize the window to do so.
- After you've reviewed the Log Window, you can clear it if you'd like by selecting the Edit menu along the top of your screen, and then selecting Clear All. In general, when you select Clear All, SAS will clear all of the contents of whichever window is currently active.
- Now, click on the Editor Window tab on the bottom of your screen so that the window becomes accessible to you again. Use your mouse to click on the minus sign that appears just before the PROC PRINT statement. Then, click on the plus sign. In general, clicking on a minus sign collapses the relevant program module, while clicking on a plus sign expands it.
- Now, use your mouse to select the three lines of code that begins with PROC PRINT and ends with RUN. If you click on the "running man" icon now, SAS will run only the selected code. This can be very helpful to you when you are trying to write and debug just a portion of your program.
- Now, to save the SAS program, select the File menu along the top of your screen, and then select Save as .... Proceed to save the file in a convenient location just as you would any other windows file. (Note that SAS saves a SAS program with a ".sas" extension.) You can open the SAS program again by selecting the File menu and then Open Program ...
That should be enough to get you started in using the SAS windowing environment. You will be writing a number of programs throughout the semester, and you'll no doubt get even more familiar with it. For that reason, however, you will want to make yourself a course folder now in which you can save your SAS programs and data sets.
1.6 - Guidelines for Formatting and Commenting SAS Programs1.6 - Guidelines for Formatting and Commenting SAS Programs
Regardless of the programming language used, there is a basic set of good programming practices to which any good programmer will adhere. Two good programming practices concern the formatting and commenting of your programs. Therefore, let's close this lesson by reviewing some guidelines for formatting and commenting your SAS programs. Throughout this course (and beyond!), you should plan on adhering to the following guidelines:
/******************** This SAS program blah, blah, blah, .......... *********************/ *Convert Fahrenheit to celsius;
As mentioned earlier, comment statements allow you to document your program without affecting processing. A delimited comment, which begins with a forward-slash-asterisk (/*) and ends with an asterisk-forward-slash (*/), is useful for creating large blocks of comments. All text within the delimiters are ignored. An alternative type of comment begins with an asterisk (*) and ends with a semicolon (;). Examples:
Although SAS allows for free-formatted code, a good SAS program will be well organized.
/****************** Filename: /home/lsimon/stat597c/sas/temp.sas Written by: laura J. Simon Date: January 9, 1996 This program calculates the average number of days that the tempreture falls below freezing in State College, PA Input: C:\data\temps.dat Output: average number of days below freezing by month stored in C:\data\temps.ssd ******************/
Every SAS program should start with a main block of comments, emphasized by asterisks. The block of comments should include the filename, by whom the program is written, the date on which the program was written, and text that clearly describes the main purpose, input and output of the program. Example:
Every critical DATA step or PROC step should be preceded by a block of comments, emphasized by asterisks, which describes the primary purpose of the step. The block of comments should also include any critical information, such as variable names, input and output of the block of code.
temp_f = 1.8*temp_c+32; *Convert celsius to fahrenheit;
Comments that pertain to a single line of code are useful, e.g. for describing what an expression is calculating, describing a new variable and how it is calculated, why the dataset is subsetted on a particular set of values, and so on. Example:
At least one line should separate any PROC or DATA steps within your SAS program.
PROC PRINT data=stat480.temps; RUN; DATA temps; set stat480.temps; if month in ('April','May','June'); RUN;
To help offset blocks of code, it is useful to capitalize PROC PROCNAME, DATA, and RUN. Examples:
PROC PRINT data=stat480.temps; title 'Tempretures in April, May, June'; var month loc time_hr time_min am_pm temp_c temp_f; RUN;
Any code contained within a DATA step or PROC step should be indented at least two spaces to improve legibility. Keywords within the DATA steps or PROC steps, such as title, variable, table, should be easily identified. Example:
Up and coming!
In this lesson, we just basically got you up and running on the SAS System. In the next lesson, we'll spend time learning how to read your data into the SAS data sets that the SAS System understands.
1.7 - Summary1.7 - Summary
In this lesson, among other things, we learned:
- the basic ("syntax") rules for all SAS statements
- the basic ("syntax") rules for all SAS names
- the basic structure of a DATA step and a PROC step
- the type of information contained in the descriptor portion of a SAS data set
- how to identify the observations and variables contained in a SAS data set
- about the six possible attributes of a variable contained in a SAS data set
- how to distinguish between numeric and character variables
- about the five different SAS windows and their function
- how to execute (or "run") a program in SAS
- the guidelines for good formatting and commenting of computer programs
- how to use the SAS Editor Window to modify a simple SAS program
Now, let's put what we learned to use by completing the homework problems. See your homework assignment for this lesson.