When processing any DATA step, SAS follows two default procedures:
- When SAS reads the DATA statement at the beginning of each iteration of the DATA step, SAS places missing values in the program data vector for variables that were assigned by either an INPUT statement or an assignment statement within the DATA step. (SAS does not reset variables to missing if they were created by a SUM statement, or if the values came from a SAS data set via a SET or MERGE statement.)
- At the end of the DATA step after completing an iteration of the DATA step, SAS outputs the values of the variables in the program data vector to the SAS data set being created.
In this lesson, we'll learn how to modify these default processes by using the OUTPUT and RETAIN statements:
- The OUTPUT statement allows you to control when and to which data set you want an observation written.
- The RETAIN statement causes a variable created in the DATA step to retain its value from the current observation into the next observation rather than it being set to missing at the beginning of each iteration of the DATA step.
Upon completing this lesson, you should be able to do the following:
- use a RETAIN statement to tell SAS to retain the value of a variable from one iteration of the data step to the next
- know which kind of variables SAS automatically retains
- use a RETAIN statement to compare values across observations
- understand how the RETAIN statement works and therefore be able to program successfully with it
- use the "FIRST." and "LAST." variables in conjunction with an OUTPUT statement in order to collapse multiple observations in a data set into a single observation
- use a SUM statement to accumulate totals across a set of observations
- use a "LAST." variable in conjunction with BY-group processing, a RETAIN statement, and an OUTPUT statement in order to transpose a data set
- use an OUTPUT statement to tell SAS to output the current observation when the output statement is processed
- use an OUTPUT statement to write observations to multiple data sets
- use an OUTPUT statement to control output of observations to data sets based on certain conditions
- understand that if you plan to use any OUTPUT statements in a DATA step, you must use OUTPUT statements to program all of the output for that step
- understand that assignment statements must precede OUTPUT statements
- use the today( ) function to determine today's date