The fundamental method of modifying the data in a data set is by way of a basic assignment statement. Such a statement always takes the form:
variable = expression;
where variable is any valid SAS name and expression is the calculation that is necessary to give the variable its values. The variable must always appear to the left of the equal sign and the expression must always appear to the right of the equal sign. As always, the statement must end with a semicolon (;).
Because assignment statements involve changing the values of variables, in the process of learning about assignment statements we'll get practice with working with both numeric and character variables. We'll also learn how using numeric SAS functions can help to simplify some of our calculations.
Example 4.1 Section
Throughout this lesson, we'll work on modifying various aspects of the temporary data set grades that is created in the following DATA step:
DATA grades; input name $ 1-15 e1 e2 e3 e4 p1 f1; DATALINES; Alexander Smith 78 82 86 69 97 80 John Simon 88 72 86 . 100 85 Patricia Jones 98 92 92 99 99 93 Jack Benedict 54 63 71 49 82 69 Rene Porter 100 62 88 74 98 92 ; RUN; PROC PRINT data = grades; var name e1 e2 e3 e4 p1 f1; RUN;
The data set contains student names (name), each of their four exam grades (e1, e2, e3, e4), their project grade (p1), and their final exam grade (f1).
A couple of comments. For the sake of the examples that follow, we'll use the DATALINES statement to read in the data. We could have just as easily used the INFILE statement. Additionally, for the sake of ease, we'll create temporary data sets rather than permanent ones. Finally, after each SAS DATA step, we'll use the PRINT procedure to print all or part of the resulting SAS data set for your perusal.
Example 4.2 Section
The following SAS program illustrates a very simple assignment statement in which SAS adds up the four exam scores of each student and stores the result in a new numeric variable called examtotal.
DATA grades; input name $ 1-15 e1 e2 e3 e4 p1 f1; * add up each students four exam scores and store it in examtotal; examtotal = e1 + e2 + e3 + e4; DATALINES; Alexander Smith 78 82 86 69 97 80 John Simon 88 72 86 . 100 85 Patricia Jones 98 92 92 99 99 93 Jack Benedict 54 63 71 49 82 69 Rene Porter 100 62 88 74 98 92 ; RUN; PROC PRINT data = grades; var name e1 e2 e3 e4 examtotal; RUN;
Note that, as previously described, the new variable name examtotal appears to the left of the equal sign, while the expression that adds up the four exam scores (e1+e2+e3+e4) appears to the right of the equal sign.
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the new numeric variable examtotal is indeed the sum of the four exam scores for each student appearing in the data set. Also note what SAS does when it is asked to calculate something when some of the data are missing. Rather than add up the three exam scores that do exist for John Simon, SAS instead assigns a missing value to his examtotal. If you think about it, that's a good thing! Otherwise, you'd have no way of knowing that his examtotal differed in some fundamental way from that of the other students. The important lesson here is to always be aware of how SAS is going to handle the missing values in your data set when you perform various calculations!
Example 4.3 Section
In the previous example, the assignment statement created a new variable in the data set by simply using a variable name that didn't already exist in the data set. You need not always use a new variable name. Instead, you could modify the values of a variable that already exists. The following SAS program illustrates how the instructor would modify the variable e2, say for example, if she wanted to modify the grades of the second exam by adding 8 points to each student's grade:
DATA grades; input name $ 1-15 e1 e2 e3 e4 p1 f1; e2 = e2 + 8; * add 8 to each student's second exam score (e2); DATALINES; Alexander Smith 78 82 86 69 97 80 John Simon 88 72 86 . 100 85 Patricia Jones 98 92 92 99 99 93 Jack Benedict 54 63 71 49 82 69 Rene Porter 100 62 88 74 98 92 ; RUN; PROC PRINT data = grades; var name e1 e2 e3 e4 p1 f1; RUN;
Note again that the name of the variable being modified (e2) appears to the left of the equal sign, while the arithmetic expression that tells SAS to add 8 to the second exam score (e2+8) appears to the right of the equal sign. In general, when a variable name appears on both sides of the equal sign, the original value on the right side is used to evaluate the expression. The result of the expression is then assigned to the variable on the left side of the equal sign.
Launch and run the SAS program. Review the output from the print procedure to convince yourself that the values of the numeric variable e2 are indeed eight points higher than the values in the original data set.