8.7 - Missing Values Generated8.7 - Missing Values Generated
This section illustrates the kinds of message you might see in the log window when you perform calculations using variables that contain missing values for some of the observations in your data set. In such a case, SAS displays a message along the lines of "Missing values were generated as a result of performing an operation on missing values." Of course, having SAS behave in this way is not always a problem. It is possible that your data contain legitmate missing values and setting the new variable to missing is a desirable action for SAS to take. On the other hand, it is also possible that missing values result from an error and that you need to fix either your program or your data. Therefore, it is always a good idea, when you receive the "missing values generated" note to take the time to play detective and verify that your program is behaving the way you desire.
The following example illustrates how SAS propagates missing values. That is, for some calculations, SAS assigns a variable a missing value if any of the values contributing to the calculation are missing. In this example, SAS generates missing values when attempting to calculate the volume of the tree when either the height or the circumference is missing:
OPTIONS PS = 58 LS = 72 NODATE NONUMBER; DATA trees; input type $ 1-16 circ_in hght_ft crown_ft; volume = (0.319*hght_ft)*(0.0000163*circ_in**2); DATALINES; oak, black 222 . 112 hemlock, eastern . 138 52 ash, white 258 80 70 cherry, black 187 91 75 maple, red 210 99 74 elm, american 229 127 104 ; RUN; PROC PRINT data = trees; title 'Tree data'; RUN;
First, review the data and note that the first tree is missing a value for hght_ft and the second tree is missing a value for circ_in. Then, launch and run the SAS program, and review the log window to see the "missing values generated" message SAS displays in this situation. If you review the output from the PRINT procedure:
you can see that SAS did indeed assign a missing value to the volume variable for the first two observations.
When you are working with a large data set, it can be difficult to locate all the places in which a missing value was generated based on a calculation. In that case, you'll probably want to use a selecting IF statement to find the missing values. The following example illustrates using an IF statement to find the observations that are assigned a missing value for the newly calculated variable volume:
OPTIONS PS = 58 LS = 72 NODATE NONUMBER; DATA trees; input type $ 1-16 circ_in hght_ft crown_ft; volume = (0.319*hght_ft)*(0.0000163*circ_in**2); if volume = .; DATALINES; oak, black 222 . 112 hemlock, eastern . 138 52 ash, white 258 80 70 cherry, black 187 91 75 maple, red 210 99 74 elm, american 229 127 104 ; RUN; PROC PRINT data = trees; title 'Trees with Missing Volumes'; RUN;
First, note that the only thing that differs between this program and the previous one is the presence of the IF statement. Then, launch and run the SAS program, and review the output from the PRINT procedure to see the two observations for which the volume is deemed missing.