8.4 - Invalid Data8.4 - Invalid Data
This section illustrates the kinds of messages you might see in the log window when data in your input raw data file are inconsistent with your INPUT statement.
The following program contains invalid values for the hght_ft variable in the first and third records of the instream data:
OPTIONS PS = 58 LS = 72 NODATE NONUMBER; DATA trees; input type $ 1-16 circ_in hght_ft crown_ft; DATALINES; oak, black 222 1O5 112 hemlock, eastern 149 138 52 ash, white 258 8O 70 cherry, black 187 91 75 maple, red 210 99 74 elm, american 229 127 104 ; RUN; PROC PRINT data = trees; title 'Trees in Kentucky'; RUN;
First, note that the zero in the "105" and "80" values in the first and third records are a little rounder than the zero that appears, say, in the "70" in the third record. That's because the "105" and "80" actually contain the letter O rather than the number 0. That means that SAS is going to have a problem reading them in, because it is expecting all of the data values for the hght_ft variable to be numeric.
Then, launch and run the SAS program, and review the log window to see the Invalid data notes that SAS displays in this situation. Let's focus for a minute on just one of the messages:
The first line tells us where the problem occurred. Specifically, it states that SAS got stuck on the hght_ft variable in columns 23 to 25 of line 220 of the raw data file that SAS was trying to read. In this case of course we are reading instream data, so that line 220 is a function of the line the record sits on within your SAS session. In fact, it is most likely that the line number you obtained differs from 220. In any case, it certainly helps to know that the problem occurs in columns 23 to 25 where SAS is expecting to read in a data value for the numeric variable hght_ft.
The second line is a type of ruler with columns as increments. The number 1 marks the tenth column, the number 2 the twentieth column, and so on. Below the ruler, SAS displays the actual line of raw data so that we can identify the culprit ourselves. Beginning in column 23, we see that the value "1O5" is the problem. The error at this point is typically found to be obvious, as is the case here.
In the fourth and final line, SAS displays the values of each variable for that observation as SAS read it. You can see that SAS set hght_ft to missing and the automatic variable _ERROR_ to 1, denoting that an error occurred while reading in the record. If you review the output from the PRINT procedure:
Trees in Kentucky
you can see where the hght_ft variable is set to missing in the first and third observations, but not in the others. In general, invalid data notes affect only the observations in which the errors are found. Now, of course, it is possible that all of your observations might have data values that are inconsistent with your INPUT statement. That might happen:
- if you forgot to use a dollar sign ($) to tell SAS that a variable is a character variable
- if your INPUT statement contains incorrect column numbers for one of the variables so that SAS tries to read blank as numeric values
- if you use special characters, such as a tab space, in numeric data
- if you use the wrong informat such as mmddyy8. instead of ddmmyy8.
Other reasons why your log window might contain invalid data notes include:
- if you are using list input to read two periods in a row with no space in between
- if your data contains invalid dates, such as February 30, and you are trying to read them in with a date informat
- if you are using list input to read data containing missing values, but no placeholders, causing SAS to read the next available data value
That's about it for invalid data. In short, the moral of the story is if your log window contains an invalid data note (or two or three or ...), suspect that one of the above things is awry. Use the location specified in the invalid data note to help ferret out the problem.