8.4 - Invalid Data

8.4 - Invalid Data

This section illustrates the kinds of messages you might see in the log window when data in your input raw data file are inconsistent with your INPUT statement.

Example 8.8

The following program contains invalid values for the hght_ft variable in the first and third records of the instream data:

OPTIONS PS = 58 LS = 72 NODATE NONUMBER;
DATA trees;
    input type $ 1-16 circ_in hght_ft crown_ft;
	DATALINES;
oak, black        222 1O5 112
hemlock, eastern  149 138  52
ash, white        258  8O  70
cherry, black     187  91  75
maple, red        210  99  74
elm, american     229 127 104
;
RUN;
PROC PRINT data = trees;
    title 'Trees in Kentucky';
RUN;

   OPTIONS PS = 58 LS = 72 NODATE NONUMBER;
   DATA trees;
       input type $ 1-16 circ_in hght_ft crown_ft;
       DATALINES;

NOTE: Invalid data for hght_ft in line 5 23-25.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6-
         oak, black        222 1O5 112
type=oak, black circ_in=222 hght_ft=. crown_ft=112 _ERROR_=1 _N_=1
NOTE: Invalid data for hght_ft in line 7 24-25.
         ash, white        258  8O  70
type=ash, white circ_in=258 hght_ft=. crown_ft=70 _ERROR_=1 _N_=3
NOTE: The data set WORK.TREES has 6 observations and 4 variables.
NOTE: DATA statement used (Total process time):
    real time           0.01 seconds
    cpu time            0.01 seconds

   ;
   RUN;
   PROC PRINT data = trees;
NOTE: Writing HTML Body file: sashtml.htm
       title 'Trees in Kentucky';
   RUN;

NOTE: There were 6 observations read from the data set WORK.TREES.
NOTE: PROCEDURE PRINT used (Total process time):
    real time           0.46 seconds
    cpu time            0.28 seconds

First, note that the zero in the "105" and "80" values in the first and third records are a little rounder than the zero that appears, say, in the "70" in the third record. That's because the "105" and "80" actually contain the letter O rather than the number 0. That means that SAS is going to have a problem reading them in, because it is expecting all of the data values for the hght_ft variable to be numeric.

Then, launch and run  the SAS program, and review the log window to see the Invalid data notes that SAS displays in this situation. Let's focus for a minute on just one of the messages:

 

NOTE: Invalid data for hght_ft in line 5 23-25.
RULE:      ----+----1----+----2----+----3----+----4----+----5----+----6-
         oak, black        222 1O5 112
type=oak, black circ_in=222 hght_ft=. crown_ft=112 _ERROR_=1 _N_=1

 

The first line tells us where the problem occurred. Specifically, it states that SAS got stuck on the hght_ft variable in columns 23 to 25 of line 220 of the raw data file that SAS was trying to read. In this case of course we are reading instream data, so that line 220 is a function of the line the record sits on within your SAS session. In fact, it is most likely that the line number you obtained differs from 220. In any case, it certainly helps to know that the problem occurs in columns 23 to 25 where SAS is expecting to read in a data value for the numeric variable hght_ft.

The second line is a type of ruler with columns as increments. The number 1 marks the tenth column, the number 2 the twentieth column, and so on. Below the ruler, SAS displays the actual line of raw data so that we can identify the culprit ourselves. Beginning in column 23, we see that the value "1O5" is the problem. The error at this point is typically found to be obvious, as is the case here.

In the fourth and final line, SAS displays the values of each variable for that observation as SAS reads it. You can see that SAS set hght_ft to missing and the automatic variable _ERROR_ to 1, denoting that an error occurred while reading in the record. If you review the output from the PRINT procedure:

Trees in Kentucky

Obs

types

circ_in

hght_ft

crown_ft

1

oak, black

222

.

112

2

hemlock, eastern

149

138

52

3

ash. white

258

.

70

4

cherry, black

187

91

75

5

maple, red

210

99

74

6

elm, american

229

127

104

you can see where the hght_ft variable is set to missing in the first and third observations, but not in the others. In general, invalid data notes affect only the observations in which the errors are found. Now, of course, it is possible that all of your observations might have data values that are inconsistent with your INPUT statement. That might happen:

  • if you forgot to use a dollar sign ($) to tell SAS that a variable is a character variable
  • if your INPUT statement contains incorrect column numbers for one of the variables so that SAS tries to read blank as numeric values
  • if you use special characters, such as a tab space, in numeric data
  • if you use the wrong informat such as mmddyy8. instead of ddmmyy8.

Other reasons why your log window might contain invalid data notes include:

  • if you are using list input to read two periods in a row with no space in between
  • if your data contains invalid dates, such as February 30, and you are trying to read them in with a date informat
  • if you are using list input to read data containing missing values, but no placeholders, causing SAS to read the next available data value

That's about it for invalid data. In short, the moral of the story is if your log window contains an invalid data note (or two or three or ...), suspect that one of the above things is awry. Use the location specified in the invalid data note to help ferret out the problem.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility