Just as is the case for other programming languages, such as C++ or S-Plus, a SAS function is a pre-programmed routine that returns a value computed from one or more arguments. The standard form of any SAS function is:
functionname(argument1, argument2,…);
For example, if we want to add three variables, a, b, and c, using the SAS function SUM and assign the resulting value to a variable named d, the correct form of our assignment statement is:
d = sum(a, b, c) ;
In this case, sum is the name of the function, d is the target variable to which the result of the SUM function is assigned, and a, b, and c are the function's arguments. Some functions require a specific number of arguments, whereas other functions, such as SUM, can contain any number of arguments. Some functions require no arguments. As you'll see in the examples that follow, the arguments can be variable names, constants, or even expressions.
SAS offers arithmetic, financial, statistical, and probability functions. There are far too many of these functions to explore them all in detail, but let's take a look at some examples.
Example 4.6 Section
In the previous example, we calculated students' average exam scores by adding up their four exam grades and dividing by 4. Alternatively, we could use the MEAN function. The following SAS program illustrates the calculation of the average exam scores in two ways — by definition and by using the MEAN function:
DATA grades;
input name $ 1-15 e1 e2 e3 e4 p1 f1;
* calculate the average by definition;
avg1 = (e1+e2+e3+e4)/4;
* calculate the average using the mean function;
avg2 = mean(e1,e2,e3,e4);
DATALINES;
Alexander Smith 78 82 86 69 97 80
John Simon 88 72 86 . 100 85
Patricia Jones 98 92 92 99 99 93
Jack Benedict 54 63 71 49 82 69
Rene Porter 100 62 88 74 98 92
;
RUN;
PROC PRINT data = grades;
var name e1 e2 e3 e4 avg1 avg2;
RUN;
Launch and run the SAS program. Review the output from the PRINT procedure to convince yourself that the two methods of calculating the average exam scores do indeed yield the same results:
Obs | name | e1 | e2 | e3 | e4 | avg1 | avg2 |
---|---|---|---|---|---|---|---|
1 | Alexander Smith | 78 | 82 | 86 | 69 | 78.75 | 78.75 |
2 | John Simon | 88 | 72 | 86 | . | . | 82.00 |
3 | Patricia Jones | 98 | 92 | 92 | 99 | 95.25 | 95.25 |
4 | Jack Benedict | 54 | 63 | 71 | 49 | 59.25 | 59.25 |
5 | Rene Porter | 10 | 62 | 88 | 74 | 81.00 | 81.00 |
Oooops! What happened? SAS reports that the average exam score for John Simon is 82 when the average is calculated using the MEAN function, but reports a missing value when the average is calculated using the definition. If you study the results, you'll soon figure out that when calculating an average using the MEAN function, SAS ignores the missing values and goes ahead and calculates the average based on the available values.
We can't really make some all-conclusive statement about which method is more appropriate, as it really depends on the situation and the intent of the programmer. Instead, the (very) important lesson here is to know how missing values are handled for the various methods that are available in SAS! We can't possibly address all of the possible calculations and functions in this course. So ... you would be wise to always check your calculations out on a few representative observations to make sure that your SAS programming is doing exactly as you intended. This is another one of those good programming practices to jot down.
Although you can refer to SAS Help and Documentation (under "functions, by category") for a full accounting of the built-in numeric functions that are available in SAS, here is a list of just some of the numeric functions that can be helpful when performing statistical analyses:
Common Functions | Example |
---|---|
INT: the integer portion of a numeric value | a = int(x); |
ABS: the absolute value of the argument | a = abs(x); |
SQRT: the square root of the argument | a = sqrt(x); |
MIN: the minimum value of the arguments | a = min(x, y, z); |
MAX: the maximum value of the arguments | a = max(x, y, z); |
SUM: the sum of the arguments | a = sum(x, y, z); |
MEAN: the mean of the arguments | a = mean(x, y, z); |
ROUND: round the argument to the specified unit | a = round(x, 1); |
LOG: the log (base e) of the argument | a = log(x); |
LAG: the value of the argument in the previous observation | a = lag(x); |
DIF: the difference between the values of the argument in the current and previous observations | a = dif(x); |
N: the number of non-missing values of the argument | a = n(x); |
NMISS: the number of missing values of the argument | a = nmiss(x); |
I have used the INT function a number of times when dealing with numbers whose first few digits contain some additional information that I need. For example, the area code in this part of Pennsylvania is 814. If I have phone numbers that are stored as numbers, say, as 8142341230, then I can use the INT function to extract the area code from the number. Let's take a look at an example of this use of the INT function.
Example 4.7 Section
The following SAS program uses the INT function to extract the area codes from a set of ten-digit telephone numbers:
DATA grades;
input name $ 1-15 phone e1 e2 e3 e4 p1 f1;
areacode = int(phone/10000000);
DATALINES;
Alexander Smith 8145551212 78 82 86 69 97 80
John Simon 8145562314 88 72 86 . 100 85
Patricia Jones 7175559999 98 92 92 99 99 93
Jack Benedict 5705551111 54 63 71 49 82 69
Rene Porter 8145542323 100 62 88 74 98 92
;
RUN;
PROC PRINT data = grades;
var name phone areacode;
RUN;
In short, the INT function returns the integer part of the expression contained within parentheses. So, if the phone number is 8145562314, then int(phone/10000000) becomes int(814.5562314) which becomes, as claimed, the area code 814. Now, launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the area codes are calculated as claimed.
Example 4.8 Section
One really cool thing is that you can nest functions in SAS (as you can in most programming languages). That is, you can compute a function within another function. When you nest functions, SAS works from the inside out. That is, SAS performs the action in the innermost function first. It uses the result of that function as the argument of the next function, and so on. You can nest any function as long as the function that is used as the argument meets the requirements for the argument.The following SAS program illustrates nested functions when it rounds the students' exam average to the nearest unit:
DATA grades;
input name $ 1-15 e1 e2 e3 e4 p1 f1;
*calculate the average using the mean function
and then round it to the nearest digit;
avg = round(mean(e1,e2,e3,e4),1);
DATALINES;
Alexander Smith 78 82 86 69 97 80
John Simon 88 72 86 . 100 85
Patricia Jones 98 92 92 99 99 93
Jack Benedict 54 63 71 49 82 69
Rene Porter 100 62 88 74 98 92
;
RUN;
PROC PRINT data = grades;
var name e1 e2 e3 e4 avg;
RUN;
For example, the average of Alexander's four exams is 78.75 (the sum of 78, 82, 86, and 69 all divided by 4). Thus, in calculating the avg for Alexander, 78.75 becomes the argument for the ROUND function. That is, 78.75 is rounded to the nearest one unit to get 79. Launch and run the SAS program, and review the output from the PRINT procedure to convince yourself that the exam averages avg are rounded as claimed.