Lesson 5: If-Then-Else Statements

Lesson 5: If-Then-Else Statements

Overview

Again, once you've read your data into a SAS data set, you probably want to do something with it. A common thing to do is to change the original data in some way in an attempt to answer a research question of interest to you. In the last lesson, we learned how to use assignment statements (and functions) to add some information to all of the observations in the data set. In this lesson, we will learn how to use if-then-else statements to add some information to some but not all of the observations in your data set.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • follow the good programming practice of programming for missing values
  • write an if-then-else statement that involves any of the comparison operators
  • write a series of mutually exclusive conditions for use in an (efficient) if-then-else statement
  • use the AND and OR operators to combine conditions for use in an if-then-else statement
  • write an if-then-else statement that compares character values efficiently and accurately

5.1 - If-Then Statements

5.1 - If-Then Statements

In this lesson, we investigate a number of examples that illustrate how to change a subset of the observations in our data set. In SAS, the most common way to select observations that meet a certain condition is to utilize an if-then statement. The basic form of the statement is:

IF (condition is true) THEN (take this action);

In the previous lesson, we looked at an example in which the condition was:

avg < 65

and the action was:

status = 'Failed'

For each observation, SAS evaluates the condition that follows the keyword IF — in this case, is the student's average less than 65? — to determine if it is true or false. If the condition is true, SAS takes the action that follows the keyword THEN — in this case, changes the student's status to 'Failed.' If the condition is false, SAS ignores the THEN clause and proceeds to the next statement in the DATA step. The condition always involves a comparison of some sort, and the action taken is typically some sort of assignment statement.

Example 5.1

There is nothing really new here. You've already seen an if-then(-else) statement in the previous lesson. Our focus there was primarily on the assignment statement. Here, we'll focus on the entire if-then statement, including the condition. The following SAS program creates a character variable status, whose value depends on whether or not the student's first exam grade is less than 65:

DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	* if the first exam is less than 65 indicate failed;
	if (e1 < 65) then status = 'Failed';
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 status;
RUN;

First, note that we continue to work with the grades data set from the last lesson. Again, the data set contains student names (name), each of their four exam grades (e1, e2, e3, e4), their project grade (p1), and their final exam grade (f1). Then, launch and run  the SAS program. Review the output from the print procedure to convince yourself that the values of the character variable status have been assigned correctly.


5.2 - Comparison Operators

5.2 - Comparison Operators

In the previous example, we used the less-than sign to make the comparison. We can use any of the standard comparison operators to make our comparisons as long as we follow the syntax that SAS expects, which is:

Comparison

SAS syntax

Alternative SAS syntax

less than

<

LT

greater than

>

GT

less than or equal to

<=

LE

greater than or equal to

>=

GE

equal to

=

EQ

not equal to

^=

NE

equal to one of a list

in

IN

It doesn't really matter which of the two syntax choices you use. It's just a matter of preference. To convince yourself that you understand how to use the alternative SAS syntax though, replace the less-than sign (<) in the Example 5.1 program with the letters "LT" (or "lt"). Then, re-run  the SAS program and review the output from the PRINT procedure to see that the program indeed performs as expected.

Example 5.2

The following SAS program uses the IN operator to identify those students who scored a 98, 99, or 100 on their project score. That is, students whose p1 value equals either 98, 99, or 100 are assigned the value 'Excellent' for the project variable:

DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	if p1 in (98, 99, 100) then project = 'Excellent';
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name p1 project;
RUN;

Launch and run the SAS program and review the output from the PRINT procedure to convince yourself that the program performs as described.

NOTE! After being introduced to the comparison operators, students are often tempted to use the syntax EQ in an assignment statement. If you try it, you'll soon learn that SAS will hiccup at you. Assignment statements must always use the equal sign (=).

5.3 - Alternative Actions

5.3 - Alternative Actions

As the output from Example 5.1 illustrates, there may be occasions when you want to use an if-then-else statement instead of just an if-then statement. In that example, we told SAS only what to do if the condition following the IF keyword was true. By including an else statement, we can tell SAS what to do if the condition following the IF keyword is false.

Example 5.3

The following SAS program creates a character variable status, whose value is "Failed" IF the student's first exam grade is less than 65, otherwise (i.e., ELSE) the value is "Passed":

DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	* if the first exam is less than 65 indicate failed;
	if (e1 < 65) then status = 'Failed';
	* otherwise indicate passed;
	else status = 'Passed';
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 status;
RUN;

Launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that the values of the character variable status have been assigned correctly.

Note that, in general, using ELSE statements with IF-THEN statements can save resources:

  • Using IF-THEN statements without the ELSE statement causes SAS to evaluate all IF-THEN statements.
  • Using IF-THEN statements with the ELSE statement causes SAS to execute IF-THEN statements until it encounters the first true statement. Subsequent IF-THEN statements are not evaluated.

For greater efficiency, you should construct your IF-THEN-ELSE statements with conditions of decreasing probabilities.


5.4 - Programming For Missing Values

5.4 - Programming For Missing Values

Example 5.4

This if-then-else stuff seems easy enough! Let's try creating another status variable for our grades data set, but this time let's allow its value to depend on the value of the student's fourth exam (e4) rather than the value of the student's first exam (e1):

DATA grades;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	* if the fourth exam is less than 65 indicate failed;
	if (e4 < 65) then status = 'Failed';
	* otherwise indicate passed;
	else status = 'Passed';
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e4 status;
RUN;

Launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that the values of the character variable status have been assigned correctly. What happened?! SAS assigned a "Failed" status to John Simon, seemingly because his exam score was missing. That's certainly one way of assigning grades, but it's probably not going to make John very happy.

The important point to remember is that SAS considers a missing value to be smaller than any other numerical value. That is, a missing value (.) is considered smaller than 12, smaller than 183, and even smaller than 0. Thus, we should stick to another good programming habit: always program for missing values. Say it to yourself over and over and over again ... always program for missing values ... until you remember it. It may save you a lot of trouble down the road.

Example 5.5

Now, let's look at our SAS program again, but this time having written the program so that SAS is told to assign status a missing value (a blank space ' ' since it is a character variable) if e4 is missing (a period . since it is a numeric variable):

DATA grades;
    length status $ 6;
	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	* if the fourth exam is missing indicate missing;
	* else if the fourth exam is less than 65 indicate failed;
	* otherwise indicate passed;
	     if (e4 = .)  then status = ' ';
	else if (e4 < 65) then status = 'Failed';
	else                   status = 'Passed';
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e4 status;
RUN;

Launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that this time the values of the character variable status really have been assigned correctly. Note that this program also illustrates the use of more than one ELSE statement. You can use as many ELSE statements as necessary, as long as they are attached to a preceding IF-THEN statement.


5.5 - Logical Operators

5.5 - Logical Operators

In addition to the comparison operators that we learned previously, we can also use the following logical operators:

Operation

SAS syntax

Alternative SAS syntax

are both conditions true?

&

AND

is either condition true?

|

OR

reverse the logic of a comparison

^ or ~

NOT

You will want to use the AND operator to execute the THEN statement if both expressions that are linked by AND are true, such as here:

IF (p1 GT 90) AND (f1 GT 90) THEN performance = 'excellent';

You will want to use the OR operator to execute the THEN statement if either expression that is linked by the OR is true, such as here:

IF (p1 GT 90) OR (f1 GT 90) THEN performance = 'very good';

And, you will want to use the NOT operator in conjunction with other operators to reverse the logic of a comparison:

IF p1 NOT IN (98, 99, 100) THEN performance = 'not excellent';

Now when we look at examples using these logical operators, why stop at just two ELSE statements? Let's go crazy and program a bunch of them! One thing though — when we do, we have to be extra careful to make sure that our conditions are mutually exclusive. That is, we have to make sure that, for each observation in the data set, one and only one of the conditions holds. This most often means that we have to make sure that the endpoints of our intervals don't overlap in some way.

Example 5.6

The following SAS program illustrates the use of several mutually exclusive conditions within an if-then-else statement. The program uses the AND operator to define the conditions. Again, when comparisons are connected by AND, all of the comparisons must be true in order for the condition to be true.

DATA grades;
    length overall $ 10;
   	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	avg = round((e1+e2+e3+e4)/4,0.1);
		 if (avg = .)                   then overall = 'Incomplete';
	else if (avg >= 90)                 then overall = 'A';
	else if (avg >= 80) and (avg < 90)  then overall = 'B';
	else if (avg >= 70) and (avg < 80)  then overall = 'C';
	else if (avg >= 65) and (avg < 70)  then overall = 'D';
	else if (avg < 65)                  then overall = 'F';	
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;
PROC PRINT data = grades;
	var name avg overall;
RUN;

Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.

DATA grades;
    length overall $ 10;
   	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	avg = round((e1+e2+e3+e4)/4,0.1); *Calculate the average and round it to one decimal place.  John Smith’s average will be missing;

	*Program for missing values;
	if (avg = .)                   then overall = 'Incomplete';
		*Make sure each student falls into one of the categories;
		else if (avg >= 90)                 then overall = 'A';
		else if (avg >= 80) and (avg < 90)  then overall = 'B';
		else if (avg >= 70) and (avg < 80)  then overall = 'C';
		else if (avg >= 65) and (avg < 70)  then overall = 'D';
		else if (avg < 65)                  then overall = 'F';	
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name avg overall;
RUN;

First, inspect the program to make sure you understand the code. Then, launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that the letter grades have been assigned correctly. Also note how the program in general, and the if-then-else statement in particular, is formatted in order to make the program easy to read. The conditions and assignment statements are aligned nicely in columns and parentheses are used to help offset the conditions. Whenever possible ... okay, make that always ... format (and comment) your programs. After all, you may actually need to use them again in a few years. Trust me ... you'll appreciate it then!

Oh, one more point. You may have noticed, after the condition that takes care of missing values, that the conditions appear in order from A, B, ... down to F. Is the instructor treating the glass as being half-full as opposed to half-empty? Hmmm ... actually, the order has to do with the efficiency of the statements. When SAS encounters the condition that is true for a particular observation, it jumps out of the if-then-else statement to the next statement in the DATA step. SAS thereby avoids having to needlessly evaluate all of the remaining conditions. Hence, we have another good programming habit ... arrange the order of your conditions (roughly speaking, of course!) in an if-then-else statement so that the most common one appears first, the next most common one appears second, and so on. You'll also need to make sure that your condition concerning missing values appears first in the IF statement, otherwise, SAS may bypass it.

Example 5.7

In the previous program, the conditions were written using the AND operator. Alternatively, we could have just used straightforward numerical intervals. The following SAS program illustrates the use of alternative intervals as well as the alternative syntax for the comparison operators:

DATA grades;
    length overall $ 10;
   	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	avg = round((e1+e2+e3+e4)/4,0.1);
		 if (avg EQ .)         then overall = 'Incomplete';
	else if (90 LE avg LE 100) then overall = 'A';
	else if (80 LE avg LT  90) then overall = 'B';
	else if (70 LE avg LT  80) then overall = 'C';
	else if (65 LE avg LT  70) then overall = 'D';
	else if (0  LE avg LT  65) then overall = 'F';
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name avg overall;
RUN;

Launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that the letter grades have again been assigned correctly.

Example 5.8

Now, suppose an instructor wants to give bonus points to students who show some sign of improvement from the beginning of the course to the end of the course. Suppose she wants to add two points to a student's overall average if either her first exam grade is less than her third and fourth exam grade or her second exam grade is less than her third and fourth exam grade. (Don't ask why! I'm just trying to motivate something here.) The operative words here are "either" and "or". In order to accommodate the instructor's wishes, we need to take advantage of the OR comparison operator. When comparisons are connected by OR, only one of the comparisons needs to be true in order for the condition to be true. The following SAS program illustrates the use of the OR operator, the AND operator, and the use of the OR and AND operators together:

DATA grades;
   	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	avg = round((e1+e2+e3+e4)/4,0.1);
		 if    ((e1 < e3) and (e1 < e4)) 
            or ((e2 < e3) and (e2 < e4)) then adjavg = avg + 2;
    else adjavg = avg;
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 avg adjavg;
RUN;

First, inspect the program to make sure you understand the code. In particular, note that logical comparisons that are enclosed in parentheses are evaluated as true or false before they are compared to other expressions. In this example:

  • SAS first determines if e1 is less than e3 AND if e1 is less than e4
  • SAS then determines if e2 is less than e3 AND if e2 is less than e4
  • SAS then determines if the first bullet is true OR if the second bullet is true

Launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that, where appropriate, two points were added to the student's average (avg) to get an adjusted average (adjavg). Also, note that we didn't have to worry about programming for missing values here, because the student's adjusted average (adjavg) would automatically be assigned missing if his or her average (avg) was missing. SAS calls this "propagation of missing values."


5.6 - Comparing Character Values

5.6 - Comparing Character Values

All of the if-then-else statement examples we've encountered so far involved only numeric variables. Our comparisons could just as easily involve character variables. The key point to remember when comparing character values is that SAS distinguishes between uppercase and lowercase letters. That is, character values must be specified in the same case in which they appear in the data set. We say that SAS is "case-sensitive." Character values must also be enclosed in quotation marks.

Example 5.9

Suppose our now infamous instructor wants to identify those students who either did not complete the course or failed. Because SAS is case-sensitive, any if-then-else statements written to identify the students have to check for those students whose status is 'failed' or 'Failed' or 'FAILED' or ... you get the idea. One rather tedious solution would be to check for all possible "typings" of the words "failed" and "incomp" (for incomplete). Alternatively, we could use the UPCASE function to first produce an uppercase value, and then make our comparisons only between uppercase values. The following SAS program takes such an approach:

DATA grades;
    length action $ 7
           action2 $ 7;
    input name $ 1-15 e1 e2 e3 e4 p1 f1 status $;
	     if (status = 'passed') then action = 'none';
    else if (status = 'failed') then action = 'contact';
	else if (status = 'incomp') then action = 'contact';
	     if (upcase(status) = 'PASSED') then action2 = 'none';
	else if (upcase(status) = 'FAILED') then action2 = 'contact';
	else if (upcase(status) = 'INCOMP') then action2 = 'contact';
	DATALINES;
Alexander Smith  78 82 86 69  97 80 passed
John Simon       88 72 86  . 100 85 incomp
Patricia Jones   98 92 92 99  99 93 PAssed
Jack Benedict    54 63 71 49  82 69 FAILED
Rene Porter     100 62 88 74  98 92 PASSED
;
RUN;

PROC PRINT data = grades;
	var name status action action2;
RUN;

Launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that the if-then-else statement that involves the creation of the variable action is inadequate while the one that uses the UPCASE function to create the variable action2 works like a charm.

By the way, when making comparisons that involve character values, you should know that SAS considers a missing character value (a blank space ' ') to be smaller than any letter, and so the good habit of programming for missing values holds when dealing with character variables as well.


5.7 - Performing Multiple Actions

5.7 - Performing Multiple Actions

All of the examples we've looked at so far have involved performing only one action for a given condition. There may be situations in which you want to perform more than one action.

Example 5.10

Suppose our instructor wants to assign a grade of zero to any student who missed the fourth exam, as well as notify the student that she has done so. The following SAS program illustrates the use of the DO-END clause to accommodate the instructor's wishes:

DATA grades;
   	input name $ 1-15 e1 e2 e3 e4 p1 f1;
	if e4 = . then do;
	    e4 = 0;
		notify = 'YES';
	end;
	DATALINES;
Alexander Smith  78 82 86 69  97 80
John Simon       88 72 86  . 100 85
Patricia Jones   98 92 92 99  99 93
Jack Benedict    54 63 71 49  82 69
Rene Porter     100 62 88 74  98 92
;
RUN;

PROC PRINT data = grades;
	var name e1 e2 e3 e4 p1 f1 notify;
RUN;

The DO statement tells SAS to treat all of the statements it encounters as one all-inclusive action until a matching END appears. If no matching END appears, SAS will hiccup. Launch and run  the SAS program, and review the output of the PRINT procedure to convince yourself that the program accomplishes what we claim.


5.8 - Summary

5.8 - Summary

In this lesson, we learned how to write if-then-else statements in order to change the contents of our SAS data set. The homework for this lesson will give you more practice with this technique so that you become even more familiar with how it works and can use them in your own SAS programming.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility