7.6 - Good Programming Practices7.6 - Good Programming Practices
Half the battle of writing programs that work is to adhere to good programming practices. In this section, we highlight four programming practices that can make your job of writing programs that work that much easier. The four cardinal practices are:
- Write programs that are easy to read.
- Test each part of your program before proceeding to the next part.
- Test your programs with small data sets.
- Test your programs with representative data.
Trust me ... these practices really can make you a more efficient SAS programmer.
Example 7.12: Write Programs That are Easy to Read
You may well recall me harping about writing programs that are easy to read and well commented. A primary argument for writing programs that are easy to read is that doing so makes it easier to prevent and/or find errors.
The following example program is needlessly challenging to read:
data example1; input a b c; d=a + b-c;e=a-c;f=c+b;g=c-b;datalines; 1 2 3 4 5 6 7 8 9 ; run; proc means; var a; run; proc print; run;
After reviewing the program to appreciate how needlessly awkward it is to read, you can go ahead and launch and run the SAS program.
Although you can write SAS statements in almost any format, a neat and consistent layout enhances readability and helps you understand the purpose of the program. In general, it's a good idea to:
- Put only one SAS statement on a line.
- Begin DATA and PROC steps in the first column of the program editor.
- Indent statements within your DATA steps and PROC steps.
- Include a RUN statement after every DATA step or PROC step.
- Begin RUN statements in the first column of the program editor.
- Comment your programs judiciously — that is, don't make too few comments so that it is difficult for you or others to know what your programs are doing, and don't make too many comments so that it is difficult to read your programs.
You might want to entertain yourself by following these guidelines while editing the above program just to make it more readable.
Example 7.13: Test Each Part of Your Program
You can increase your programming efficiency tremendously by making sure each part of your program is working before moving on to write the next part. The simplest way to illustrate how wrong you can go is to imagine that you've just spent the last two weeks performing a complex statistical analysis on a data set only to discover later that the data set contains errors. If you had only used a simple PRINT procedure first to print and review the data set, you would have saved yourself lots of useless work. The reason why I cite this particular example is that it has probably happened at least once to every statistician working out there. This particular statistician is trying to save you from going down the same path!
The following program may appear to work just fine as SAS does indeed produce an answer. If you you look carefully at both the input and output, though, you'll see that the answer is not what we should expect:
DATA example2; INPUT a b c; DATALINES; 112 234 345 115 367 190 110 111 ; RUN; PROC MEANS; var c; RUN;
We'll learn later in this course that the MEANS procedure allows us to calculate the mean, as well as other summary statistics, of a numeric variable. In this program, the MEAN procedure tells SAS to calculate the mean of the numeric variable c.
Go ahead and launch and run the SAS program. Then, in reviewing the output, compare the answer we obtain for the mean of the variable c (267.5) with the answer that we should have obtained (228, from 345 plus 111 all divided by 2). Then, if you look at the log window:
you might be able to figure out what went awry. In the next lesson, we'll be investigating errors, warnings, and notes, such as this one about SAS going to a new line that SAS prints in the log window as a means to communicate to you that your program might not be doing what you expect.
In this example, SAS couldn't find the value of c in the second line of data, so as reported in the log window, SAS went to the next line of data to find it. If you print the example2 data set by adding a PRINT procedure after the DATA step, you'll see what I mean:
Then, you might want to add a missing value (.) placeholder to the second line of data (after the 367), and re-run the program to see that it now works as it should.
While testing your programs, you might find the PUT statement to be particularly useful. The following program reads in the tree data into the trees data set and calculates the volume of each tree as in Example 7.3. Here though a few PUT statements have been added to help the programmer verify that the program is doing what she expects:
DATA trees; input type $ 1-16 circ_in hght_ft crown_ft; volume = (0.319*hght_ft)*(0.0000163*circ_in**2); if volume = . then do; PUT ' '; PUT 'DATA ERROR!!! '; PUT ' '; PUT ' '; end; else if volume lt 20 then PUT 'Small tree ' _N_= volume=; else if volume ge 20 then PUT 'Large tree ' _N_= volume=; DATALINES; oak, black 222 1O5 112 hemlock, eastern 149 138 52 ash, white 258 80 70 cherry, black 187 91 75 maple, red 210 99 74 elm, american 229 127 104 ; RUN; PROC PRINT data = trees; RUN;
The PUT statement writes messages in the log window. If you launch and run the SAS program and take a look at the log window, you should see that a portion of the log window contains messages created as a result of executing the PUT statements: