Lesson 17: Using the OUTPUT and RETAIN statements

Overview Section

When processing any DATA step, SAS follows two default procedures:

  1. When SAS reads the DATA statement at the beginning of each iteration of the DATA step, SAS places missing values in the program data vector for variables that were assigned by either an INPUT statement or an assignment statement within the DATA step. (SAS does not reset variables to missing if they were created by a SUM statement, or if the values came from a SAS data set via a SET or MERGE statement.)
  2. At the end of the DATA step after completing an iteration of the DATA step, SAS outputs the values of the variables in the program data vector to the SAS data set being created.

In this lesson, we'll learn how to modify these default processes by using the OUTPUT and RETAIN statements:

  • The OUTPUT statement allows you to control when and to which data set you want an observation written.
  • The RETAIN statement causes a variable created in the DATA step to retain its value from the current observation into the next observation rather than being set to missing at the beginning of each iteration of the DATA step.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • write a RETAIN statement to tell SAS to retain the value of a variable from one iteration of the data step to the next
  • state which kind of variables SAS automatically retains
  • write a RETAIN statement to compare values across observations
  • program successfully with the RETAIN statement
  • write the "FIRST." and "LAST." variables in conjunction with an OUTPUT statement in order to collapse multiple observations in a data set into a single observation
  • write a SUM statement to accumulate totals across a set of observations
  • write a "LAST." variable in conjunction with BY-group processing, a RETAIN statement, and an OUTPUT statement in order to transpose a data set
  • write an OUTPUT statement to tell SAS to output the current observation when the output statement is processed
  • write an OUTPUT statement to write observations to multiple data sets
  • write an OUTPUT statement to control the output of observations to data sets based on certain conditions
  • recall that if you plan to use any OUTPUT statements in a DATA step, you must use OUTPUT statements to program all of the output for that step
  • recall that assignment statements must precede OUTPUT statements
  • write the today( ) function to determine today's date