# 15.4 - Interleaving SAS data sets

15.4 - Interleaving SAS data setsProvided your data sets are properly sorted, you can **interleave** two or more SAS data sets. Interleaving combines individual sorted data sets into one big sorted data set based on the variables listed in a BY statement. For example, when data set *one*:

Year | X |
---|---|

2000 | 1 |

2001 | 2 |

2002 | 3 |

2003 | 4 |

and data set *two*:

Year | X |
---|---|

2001 | 5 |

2002 | 6 |

2003 | 7 |

2004 | 8 |

are interleaved by variable *year*, we get data set *three*:

Year | X |
---|---|

2000 | 1 |

2001 | 2 |

2001 | 5 |

2002 | 3 |

2002 | 6 |

2003 | 4 |

2003 | 7 |

2004 | 8 |

To interleave, you specify the data sets you want interleaved in the SET statement, and indicate on which variable you want the final data set sorted in the BY statement. You may interleave as many data sets as you'd like. The resulting data set contains all of the variables and all of the observations from all of the input data sets.

## Example 15.11

The following program interleaves the* one* and *two *data sets by *year*:

```
DATA one;
input year x;
DATALINES;
2000 1
2001 2
2002 3
2003 4
;
RUN;
DATA two;
input year x;
DATALINES;
2001 5
2002 6
2003 7
2004 8
;
RUN;
DATA three;
set one two;
by year;
RUN;
PROC PRINT data = three NOOBS;
title 'The interleaved three data set';
RUN;
```

The first two DATA steps, of course, just tell SAS to read the data values into the *one* and *two* data sets. The third, and most relevant DATA step to this topic, contains both a SET statement and a BY statement. That tells SAS that we want to interleave the data sets appearing in the SET statement (*one* and *two*) by the variable appearing in the BY statement (*year*) and to store the result in a data set called *three*. Launch and run * * the SAS program. Review the output from the PRINT procedure to convince yourself that the *three *data set contains the contents of the *one* and *two* data sets sorted by *year*.

## Example 15.12

As you may have noticed, interleaving is equivalent to the two-step process of concatenating two or more data sets and then sorting them. The following program illustrates this point by using the two-step process to create a data set *four *that is identical to the data set *three*:

```
DATA unsortedfour;
set one two;
RUN;
PROC PRINT data = unsortedfour NOOBS;
title 'The unsortedfour data set';
RUN;
PROC SORT data = unsortedfour out = four;
by year;
RUN;
PROC PRINT data = four NOOBS;
title 'The four data set';
RUN;
```

Launch and run * * the SAS program, and review the output from the last PRINT procedure to convince yourself that the *four* data set is identical to the interleaved *three *data set.