7.6 - Good Programming Practices

Half the battle of writing programs that work is to adhere to good programming practices. In this section, we highlight four programming practices that can make your job of writing programs that work that much easier. The four cardinal practices are:

  1. Write programs that are easy to read.
  2. Test each part of your program before proceeding to the next part.
  3. Test your programs with small data sets.
  4. Test your programs with representative data.

Trust me ... these practices really can make you a more efficient SAS programmer.

Example 7.12: Write Programs That are Easy to Read Section

You may well recall me harping about writing programs that are easy to read and well commented. A primary argument for writing programs that are easy to read is that doing so makes it easier to prevent and/or find errors.

The following example program is needlessly challenging to read:

data example1; input a b c; d=a
+ b-c;e=a-c;f=c+b;g=c-b;datalines;
1 2 3
4 5 6
7 8 9
;
run; proc means; var a; run; proc print; run;

After reviewing the program to appreciate how needlessly awkward it is to read, you can go ahead and launch and run  the SAS program.

Although you can write SAS statements in almost any format, a neat and consistent layout enhances readability and helps you understand the purpose of the program. In general, it's a good idea to:

  • Put only one SAS statement on a line.
  • Begin with the DATA and PROC steps in the first column of the program editor.
  • Indent statements within your DATA steps and PROC steps.
  • Include a RUN statement after every DATA step or PROC step.
  • Begin RUN statements in the first column of the program editor.
  • Comment your programs judiciously — that is, don't make too few comments so that it is difficult for you or others to know what your programs are doing, and don't make too many comments so that it is difficult to read your programs.

You might want to entertain yourself by following these guidelines while editing the above program just to make it more readable.

Example 7.13: Test Each Part of Your Program Section

You can increase your programming efficiency tremendously by making sure each part of your program is working before moving on to write the next part. The simplest way to illustrate how wrong you can go is to imagine that you've just spent the last two weeks performing a complex statistical analysis on a data set only to discover later that the data set contains errors. If you had only used a simple PRINT procedure first to print and review the data set, you would have saved yourself lots of useless work. The reason why I cite this particular example is that it has probably happened at least once to every statistician working out there. This particular statistician is trying to save you from going down the same path!

The following program may appear to work just fine as SAS does indeed produce an answer. If you look carefully at both the input and output, though, you'll see that the answer is not what we should expect:

DATA example2;
   INPUT a b c;
   DATALINES;
112 234 345
115 367
190 110 111
;
RUN;
PROC MEANS;
  var c;
RUN;

The MEANS Procedure
Analysis variable: c

N

Mean

Std Dev

Minimum

Maximum

2

267.5000000

109.6015511

190.0000000

345.0000000

We'll learn later in this course that the MEANS procedure allows us to calculate the mean, as well as other summary statistics, of a numeric variable. In this program, the MEAN procedure tells SAS to calculate the mean of the numeric variable c.

Go ahead and launch and run  the SAS program. Then, in reviewing the output, compare the answer we obtained for the mean of the variable c (267.5) with the answer that we should have obtained (228, from 345 plus 111 all divided by 2). Then, if you look at the log window:

  DATA example2;
     INPUT a b c;
     DATALINES;
NOTE: SAS went to a new line when INPUT statement reached past the end
      of a line.
NOTE: The data set WORK.EXAMPLE2 has 2 observations and 3 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.01 seconds
  ;
  RUN;
  PROC MEANS;
    var c;
  RUN;
NOTE: There were 2 observations read from the data set WORK.EXAMPLE2.
NOTE: PROCEDURE MEANS used (Total process time):
      real time           0.03 seconds
      cpu time            0.01 seconds

you might be able to figure out what went awry. In the next lesson, we'll be investigating errors, warnings, and notes, such as this one about SAS going to a new line that SAS prints in the log window as a means to communicate to you that your program might not be doing what you expect.

In this example, SAS couldn't find the value of c in the second line of data, so as reported in the log window, SAS went to the next line of data to find it. If you print the example2 data set by adding a PRINT procedure after the DATA step, you'll see what I mean:

Obs

a

b

c

1

112

234

345

2

115

367

190

Then, you might want to add a missing value (.) placeholder to the second line of data (after the 367) and re-run  the program to see that it now works as it should.

Example 7.14 Section

While testing your programs, you might find the PUT statement to be particularly useful. The following program reads in the tree data into the trees data set and calculates the volume of each tree as in Example 7.3. Here though a few PUT statements have been added to help the programmer verify that the program is doing what she expects:

DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	volume = (0.319*hght_ft)*(0.0000163*circ_in**2);
    if volume = . then do;
	     PUT ' ';
	     PUT 'DATA ERROR!!! ';
		 PUT ' ';
		 PUT ' ';
	end;
	else if volume lt 20 then PUT 'Small tree ' _N_= volume=;
	else if volume ge 20 then PUT 'Large tree ' _N_= volume=;
	DATALINES;
oak, black        222 1O5 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;
PROC PRINT data = trees;
RUN;

Obs

type

circ_in

hght_ft

crown_ft

volume

1

oak, black

222

.

112

.

2

hemlock, eastern

149

138

52

15.9305

3

ash, white

258

80

70

27.6890

4

cherry, black

187

91

75

16.5464

5

maple, red

210

99

74

22.7014

6

elm, american

229

127

104

34.6300

The PUT statement writes messages in the log window. If you launch and run  the SAS program and take a look at the log window, you should see that a portion of the log window contains messages created as a result of executing the PUT statements:

  DATA trees;
      input type $ 1-16 circ_in hght_ft crown_ft;
      volume = (0.319*hght_ft)*(0.0000163*circ_in**2);
      if volume = . then do;
           PUT ' ';
           PUT 'DATA ERROR!!! ';
           PUT ' ';
           PUT ' ';
      end;
      else if volume lt 20 then PUT 'Small tree ' _N_= volume=;
      else if volume ge 20 then PUT 'Large tree ' _N_= volume=;
      DATALINES;
NOTE: Invalid data for hght_ft in line 187 23-25.
DATA ERROR!!!
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6-
           oak, black        222 1O5 112
type=oak, black circ_in=222 hght_ft=. crown_ft=112 volume=. _ERROR_=1
_N_=1
Small tree _N_=2 volume=15.930518479
Large tree _N_=3 volume=27.689026464
Small tree _N_=4 volume=16.546376146
Large tree _N_=5 volume=22.70137023
Large tree _N_=6 volume=34.630038398
NOTE: Missing values were generated as a result of performing an
      operation on missing values.
      Each place is given by: (Number of times) at (Line):(Column).
      1 at 177:20
NOTE: The data set WORK.TREES has 6 observations and 5 variables.
NOTE: DATA statement used (Total process time):
      real time           0.01 seconds
      cpu time            0.00 seconds
  ;
  RUN;
  PROC PRINT data = trees;
  RUN;
NOTE: There were 6 observations read from the data set WORK.TREES.
NOTE: PROCEDURE PRINT used (Total process time):
      real time           0.00 seconds
      cpu time            0.01 seconds