Lesson 8: Writing Programs That Work - Part II

Lesson 8: Writing Programs That Work - Part II

Overview

We've spent some time learning how SAS processes programs. Now, we'll turn our attention to the various messages that SAS displays in the log window with the intention of improving our ability to find errors in our SAS programs. Specifically, we'll investigate the log messages SAS might display:

  • when we have a missing semicolon
  • when we have used an invalid option, invalid name, or invalid statement
  • when we have a missing quotation mark
  • when our input data file contains invalid data
  • when we have misspelled a variable name
  • when our input statement has reached past the end of line
  • when our calculations generate missing values

Our intention again is to make us all more efficient SAS programmers!

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • look at the LOG file every time you execute a SAS program to see what errors, warnings, and notes your program caused SAS to display
  • recognize the types of error messages SAS might display in the log file when a semicolon is missing
  • appreciate the value of using the DATASTMTCHK system option
  • recognize the possible causes of receiving an error that you've used an invalid option, invalid name, or invalid statement
  • recognize the possible causes of receiving a warning informing you that a TITLE statement is ambiguous
  • recognize the possible causes of receiving a note informing you that SAS encountered invalid data
  • recognize the possible causes of receiving a note informing you that a variable is uninitialized or not found
  • recognize the possible causes of receiving a note informing you that the INPUT statement reached past the end of the line
  • recognize the possible causes of receiving a note informing you that SAS generated missing values as a result of performing an operation on missing values

8.1 - Missing Semicolons

8.1 - Missing Semicolons

This section illustrates the kinds of messages you might see in the log window when you've forgotten to put a semicolon at the end of a SAS statement. My advice: when your SAS program won't run, and you have ERROR messages appearing in your log window, check to see if you are missing a semicolon somewhere in the statements immediately preceding the location where you are getting the error message.

Example 8.1

The following program is missing a semicolon on the comment statement just before the DATA statement:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

*Read in the trees data set
DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

When you omit a semicolon, SAS reads the statement that lacks the semicolon, plus the following statement, as one long statement. In this case, DATA trees becomes a part of the comment statement. Therefore, SAS thinks that the program doesn't contain a DATA statement. Therefore, SAS underlines the INPUT keyword and complains that it doesn't appear in a valid part of a SAS program. Launch and run the SAS program, and review the contents of the log window to see the message that submitting such a program causes.

Example 8.2

This example shows the same program as above, but now the semicolon is missing at the end of the DATA statement:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

*Read in the trees data set;
DATA trees
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

In this case, the INPUT statement becomes a part of the DATA statement. Therefore, SAS expects to see valid data set names or options, not a dollar sign ($) or column numbers. Launch and run the SAS program, and review the contents of the log window to see the message that submitting such a program causes. You should see that SAS reports that it found a syntax error, and then goes on to tell you what it expected to find in the DATA statement.

Example 8.3

The next example shows the same program as above, in which the semicolon is missing at the end of the DATA statement. However, now the DATASTMTCHK system option has been added to the code:

OPTIONS DATASTMTCHK = ALLKEYWORDS;

*Read in the trees data set;
DATA trees
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

The DATASTMTCHK system option controls what names you can use for SAS data sets in a DATA statement. By default, the option is set so that you cannot use the words MERGE, RETAIN, SET, or UPDATE as a SAS data set name. You can instead make all SAS keywords invalid SAS data set names by setting the DATASTMTCHK option to ALLKEYWORDS, as is done in the above program. Launch and run the SAS program, and review the log window to see the message SAS displays when the DATASTMTCHK option is invoked.

We've looked at three examples now in which a semicolon is missing. Do you think it is fair to say that the moral of the story is that you can usually find the statement that lacks the semicolon by working backwards in your program starting with the keywords that are underscored in the error message? Once you've found the location where the semicolon is missing, simply add a semicolon, resubmit your corrected program, and check the SAS log again to make sure that there are no other errors.


8.2 - Invalid Options, Names, or Statements

8.2 - Invalid Options, Names, or Statements

This section illustrates the kinds of messages you might see in the log window when you've used an invalid option, invalid name, or invalid statement.

Example 8.4

The following example illustrates the "Error: Invalid option" message SAS displays in the log window when you attempt to use an option that is invalid:

DATA trees (ROP = crown_ft);
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

We'll learn down the road that we can use the DATA statement's DROP= option when we want SAS to drop a variable from the program data vector before writing observations to the output data set. In the above program, we attempt to drop the variable crown_ft from the output trees data set. As you can see, though, we forgot to type the starting D. SAS doesn't know that though ... it thinks we want to use the nonexistent ROP= option. Launch and run the SAS program, and review the log window to see the message SAS displays in this situation.

Example 8.5

The following example illustrates the "Error: Syntax error" message SAS displays in the log window when your input statement is incorrect:

DATA trees;
    input *type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

Of course, what is really going on here is that we are trying to use an invalid variable name. Recall that variable names must begin with a letter or an underscore. Trying to specify a variable name *type thus causes SAS to hiccup. SAS does not recognize *type as a valid variable name, and therefore provides a list of items SAS would expect to follow the INPUT keyword. Launch and run the SAS program, and review the log window to see the message SAS displays in this situation.

Example 8.6

The following example illustrates the "Error: Statement is not valid or it is used out of proper order" message SAS displays in the log window when you attempt to use a statement that is not valid:

DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

PROC PRINT;
     set type circ_in hght_ft;
RUN;

Of course, instead of using the SET statement to tell SAS what variables we would like displayed, we should be using the VAR statement. Launch and run the SAS program, and review the log window to see the message SAS displays in this situation.

Whether your program contains an invalid option, invalid name, or invalid statement, simply use the log messages to locate the problem and make the appropriate correction to your program. After you resubmit your corrected program, check the SAS log again to make sure there are no other errors.


8.3 - Missing Quotation Marks

8.3 - Missing Quotation Marks

This section illustrates the kinds of messages you might see in the log window when you have unbalanced quotation marks.

Example 8.7

As should be fairly obvious by the coloration of the code, the following program is missing a closing quotation mark in the PRINT procedure's first TITLE statement:

DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

PROC PRINT;
     var type circ_in hght_ft;
	 title 'Some trees in Kentucky
	 title2 'Division of Forestry';
RUN;

If you launch and run the SAS program, you should see that no output is generated, and in the top blue bar of the SAS window, you should notice that the PRINT procedure continues to run:

That's because when SAS encountered the unbalanced quotation marks in the first TITLE statement, it continued to look for the closing quotation mark and in the process swallowed up the PRINT procedure's closing RUN statement.

Unfortunately, simply adding a quotation mark, and resubmitting your program typically does not solve the problem. SAS still considers the quotation marks to be unbalanced. (You might want to convince yourself that this is indeed true by adding the quotation mark to the end of the first TITLE statement and resubmitting the program.)

Instead, you must first cancel the errant program before you correct and resubmit the program. These are the specific steps that you need to take in this situation:

  1. Anywhere in your program, type an asterisk followed by a quotation mark, a semicolon, and a RUN statement with a closing semicolon:

    *'; RUN;
  2. Use your cursor to select just the line of code that you typed. Then, submit just that line of code by clicking on the running man .

  3. Delete the line of code that contains the asterisk followed by a quotation mark, a semicolon, and a RUN statement with a closing semicolon:

    *'; RUN;
  4. Insert the missing quotation mark in the appropriate place in the program.

  5. Submit the corrected program.

You should go ahead and follow these steps for our program to convince yourself that the steps work. Then, when all is said and done, you should look at the log window to see the two warning messages that SAS displays about the "quoted string has more than 262 characters" and "the TITLE statement is ambiguous due to invalid options or unquoted text."

Whenever SAS behaves in the way described in this section, the first thing you should suspect is unbalanced quotation marks. Then, you'll want to make sure you resolve the problem as soon as it occurs. Otherwise, it is likely that any subsequent programs that you submit in the current SAS session will generate errors.


8.4 - Invalid Data

8.4 - Invalid Data

This section illustrates the kinds of messages you might see in the log window when data in your input raw data file are inconsistent with your INPUT statement.

Example 8.8

The following program contains invalid values for the hght_ft variable in the first and third records of the instream data:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 1O5 112
hemlock, eastern  149 138  52
ash, white        258  8O  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

PROC PRINT data = trees;
    title 'Trees in Kentucky';
RUN;

First, note that the zero in the "105" and "80" values in the first and third records are a little rounder than the zero that appears, say, in the "70" in the third record. That's because the "105" and "80" actually contain the letter O rather than the number 0. That means that SAS is going to have a problem reading them in, because it is expecting all of the data values for the hght_ft variable to be numeric.

Then, launch and run the SAS program, and review the log window to see the Invalid data notes that SAS displays in this situation. Let's focus for a minute on just one of the messages:

The first line tells us where the problem occurred. Specifically, it states that SAS got stuck on the hght_ft variable in columns 23 to 25 of line 220 of the raw data file that SAS was trying to read. In this case of course we are reading instream data, so that line 220 is a function of the line the record sits on within your SAS session. In fact, it is most likely that the line number you obtained differs from 220. In any case, it certainly helps to know that the problem occurs in columns 23 to 25 where SAS is expecting to read in a data value for the numeric variable hght_ft.

The second line is a type of ruler with columns as increments. The number 1 marks the tenth column, the number 2 the twentieth column, and so on. Below the ruler, SAS displays the actual line of raw data so that we can identify the culprit ourselves. Beginning in column 23, we see that the value "1O5" is the problem. The error at this point is typically found to be obvious, as is the case here.

In the fourth and final line, SAS displays the values of each variable for that observation as SAS read it. You can see that SAS set hght_ft to missing and the automatic variable _ERROR_ to 1, denoting that an error occurred while reading in the record. If you review the output from the PRINT procedure:

Trees in Kentucky
Obs types circ_in hght_ft crown_ft
1 oak, black 222 . 112
2 hemlock, eastern 149 138 52
3 ash. white 258 . 70
4 cherry, black 187 91 75
5 maple, red 210 99 74
6 elm, american 229 127 104

you can see where the hght_ft variable is set to missing in the first and third observations, but not in the others. In general, invalid data notes affect only the observations in which the errors are found. Now, of course, it is possible that all of your observations might have data values that are inconsistent with your INPUT statement. That might happen:

  • if you forgot to use a dollar sign ($) to tell SAS that a variable is a character variable
  • if your INPUT statement contains incorrect column numbers for one of the variables so that SAS tries to read blank as numeric values
  • if you use special characters, such as a tab space, in numeric data
  • if you use the wrong informat such as mmddyy8. instead of ddmmyy8.

Other reasons why your log window might contain invalid data notes include:

  • if you are using list input to read two periods in a row with no space in between
  • if your data contains invalid dates, such as February 30, and you are trying to read them in with a date informat
  • if you are using list input to read data containing missing values, but no placeholders, causing SAS to read the next available data value

That's about it for invalid data. In short, the moral of the story is if your log window contains an invalid data note (or two or three or ...), suspect that one of the above things is awry. Use the location specified in the invalid data note to help ferret out the problem.


8.5 - Variable Not Found

8.5 - Variable Not Found

This section illustrates the kinds of messages you might see in the log window when you have misspecified a variable name somewhere in your program.

Example 8.9

The following example illustrates the "Note: Variable is uninitialized" and "Error: Variable not found" messages SAS displays in the log window to warn you of such problems:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	volume = (0.319*hght)*(0.0000163*circ_in**2);
	DATALINES;
oak, black        222 105 112
hemlock, eastern  149 138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

PROC PRINT data = trees;
    var type height circ_in volume;
RUN;

First, note that there are two places in which a variable name was misspecified in this program. In the calculation of volume in the DATA step, the height of the tree is referred to as hght rather than hght_ft in which the heights were actually stored. And, in the PRINT procedure, the height of the tree is referred to as height. Well, okay, so the programmer, is a little confused! Launch and run the SAS program, and review the log window to see the two messages that SAS displays in this situation.

Common ways to "lose" variables include:

  • misspelling a variable name
  • using a variable that was dropped from the data set at some earlier time
  • using the wrong data set
  • committing a logic error, such as using a variable before it is created

If the source of the problem is not immediately obvious, submitting a CONTENTS procedure can often help you sniff out the problem. As you may recall from an earlier lesson, the CONTENTS procedure provides, among other things, the names of the variables contained in a SAS data set.


8.6 - Input Reached Past the End of the Line

8.6 - Input Reached Past the End of the Line

This section illustrates the kinds of message you might see in the log window when your input raw data fails to contain placeholders for missing values. The note "SAS went to a new line when INPUT statement reached past the end of a line" is rather innocent looking, but its presence can suggest there is a problem with your input data. The note means that as SAS was reading your data, it reached the end of the data line before it read values for all of the variable names appearing in your INPUT statement. When this happens, by default, SAS proceeds to the next line of data to get values for the remaining variables. It many cases that is not how you'd like SAS to behave, and so you need to compare your input data and output data sets carefully to make sure the data were read in properly.

Example 8.10

The following example shows what can happen if you are using list input, and your data file doesn't contain periods as placeholders for numeric missing values:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

DATA trees;
    input treeID circ_in hght_ft crown_ft;
	DATALINES;
101 222 105 112
102 149 138 
103 258  80  70
104 187  91  
105 210  99  74
106 229 127 104
;
RUN;

PROC PRINT data = trees;
    title 'Tree data';
RUN;

First, review the data and note that tree numbers 102 and 104 are missing values for the crown_ft variable, with no periods serving as placeholders for the missing values. Then, launch and run the SAS program, and review the log window to see the message SAS displays in this situation. If you review the output from the PRINT procedure:

Tree data
Obs tree ID circ_in hght_ft crown_ft
1 101 222 105 112
2 102 149 138 103
3 104 187 91 105
4 106 229 127 104

you can see the effect on the output data set of SAS going to a new line to find the missing data. For example, for tree 102, when SAS went to the next line to look for the crown_ft value, it found the value 103, which was supposed to be the ID number for the next tree. Oh, if SAS could only read our mind!

Example 8.11

One way of solving the problem of SAS going to the next line to look for the missing data values is to insert missing value periods (.) as placeholders. That solution would work for this small data set, but it wouldn't work when you are working with a large data set with thousands of records. In that case, the simplest thing to do to prevent SAS from going to a new line looking for data is to use the MISSOVER option of the INFILE statement. The MISSOVER option tells SAS to assign missing values to any variables for which there were no data instead of proceeding to the next line looking for the values. The following example illustrates using the MISSOVER option:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

DATA trees;
    INFILE DATALINES MISSOVER;
    input treeID circ_in hght_ft crown_ft;
	DATALINES;
101 222 105 112
102 149 138 
103 258  80  70
104 187  91  
105 210  99  74
106 229 127 104
;
RUN;

PROC PRINT data = trees;
    title 'Tree data';
RUN;

First, note that the only thing that differs between this program and the previous one is the presence of the INFILE statement with the MISSOVER option. Then, launch and run the SAS program, and review the output from the PRINT procedure to confirm that SAS correctly reads in the data when the MISSOVER option is invoked:

Tree data
Obs tree ID circ_in hght_ft crown_ft
1 101 22 105 112
2 102 138 138 .
3 103 258 80 70
4 104 187 91 .
5 105 210 99 74
6 106 229 127 104

You might also want to take a look at the log window to verify that this time SAS did not display a NOTE about going to a new line when it reached past the end of a line.


8.7 - Missing Values Generated

8.7 - Missing Values Generated

This section illustrates the kinds of message you might see in the log window when you perform calculations using variables that contain missing values for some of the observations in your data set. In such a case, SAS displays a message along the lines of "Missing values were generated as a result of performing an operation on missing values." Of course, having SAS behave in this way is not always a problem. It is possible that your data contain legitmate missing values and setting the new variable to missing is a desirable action for SAS to take. On the other hand, it is also possible that missing values result from an error and that you need to fix either your program or your data. Therefore, it is always a good idea, when you receive the "missing values generated" note to take the time to play detective and verify that your program is behaving the way you desire.

Example 8.12

The following example illustrates how SAS propagates missing values. That is, for some calculations, SAS assigns a variable a missing value if any of the values contributing to the calculation are missing. In this example, SAS generates missing values when attempting to calculate the volume of the tree when either the height or the circumference is missing:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	volume = (0.319*hght_ft)*(0.0000163*circ_in**2);
	DATALINES;
oak, black        222   . 112
hemlock, eastern  .   138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

PROC PRINT data = trees;
    title 'Tree data';
RUN;

First, review the data and note that the first tree is missing a value for hght_ft and the second tree is missing a value for circ_in. Then, launch and run the SAS program, and review the log window to see the "missing values generated" message SAS displays in this situation. If you review the output from the PRINT procedure:

Tree data
Obs type circi_in hght_ft crown_ft volume
1 oak, black 222 . 112 .
2 hemlock, eastern . 138 52 .
3 ash, white 258 80 70 27.6890
4 cherry, black 187 91 75 16.5464
5 maple, red 210 99 74 22.7014
6 elm, american 229 127 104 34.6300

you can see that SAS did indeed assign a missing value to the volume variable for the first two observations.

Example 8.13

When you are working with a large data set, it can be difficult to locate all the places in which a missing value was generated based on a calculation. In that case, you'll probably want to use a selecting IF statement to find the missing values. The following example illustrates using an IF statement to find the observations that are assigned a missing value for the newly calculated variable volume:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;

DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	volume = (0.319*hght_ft)*(0.0000163*circ_in**2);
	if volume = .;
	DATALINES;
oak, black        222   . 112
hemlock, eastern  .   138  52
ash, white        258  80  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;

PROC PRINT data = trees;
    title 'Trees with Missing Volumes';
RUN;
Tree data
Obs type circi_in hght_ft crown_ft volume
1 oak, black 222 . 112 .
2 hemlock, eastern . 138 52 .

First, note that the only thing that differs between this program and the previous one is the presence of the IF statement. Then, launch and run the SAS program, and review the output from the PRINT procedure to see the two observations for which the volume is deemed missing.


8.8 - Summary

8.8 - Summary

In this lesson, we've learned the different ways in which SAS can help us find problems with our SAS programs. Of course, the primary way is by reviewing the messages that SAS displays in the log window. Have I said before that you should always review the log window after submitting a program?

The homework for this lesson will, of course, give you practice debugging a SAS program.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility