15.4 - Interleaving SAS data sets

Provided your data sets are properly sorted, you can interleave two or more SAS data sets. Interleaving combines individual sorted data sets into one big sorted data set based on the variables listed in a BY statement. For example, when data set one:

Year X
2000 1
2001 2
2002 3
2003 4

and data set two:

Year X
2001 5
2002 6
2003 7
2004 8

are interleaved by variable year, we get data set three:

Year X
2000 1
2001 2
2001 5
2002 3
2002 6
2003 4
2003 7
2004 8

To interleave, you specify the data sets you want interleaved in the SET statement, and indicate on which variable you want the final data set sorted in the BY statement. You may interleave as many data sets as you'd like. The resulting data set contains all of the variables and all of the observations from all of the input data sets.

Example 15.11 Section

The following program interleaves the one and two data sets by year:

DATA one;
    input year x;
    DATALINES;
    2000 1
    2001 2
    2002 3
    2003 4
    ;
RUN;
 
DATA two;
    input year x;
    DATALINES;
    2001 5
    2002 6
    2003 7
    2004 8
    ;
RUN;
 
DATA three;
    set one two;
        by year;
RUN;
 
PROC PRINT data = three NOOBS;
    title 'The interleaved three data set';
RUN;

The interleaved three data set

year

x

2000

1

2001

2

2001

5

2002

3

2002

6

2003

4

2003

7

2004

8

The first two DATA steps, of course, just tell SAS to read the data values into the one and two data sets. The third, and most relevant DATA step to this topic, contains both a SET statement and a BY statement. That tells SAS that we want to interleave the data sets appearing in the SET statement (one and two) by the variable appearing in the BY statement (year) and to store the result in a data set called three. Launch and run  the SAS program. Review the output from the PRINT procedure to convince yourself that the three data set contains the contents of the one and two data sets sorted by year.

Example 15.12 Section

As you may have noticed, interleaving is equivalent to the two-step process of concatenating two or more data sets and then sorting them. The following program illustrates this point by using the two-step process to create a data set four that is identical to the data set three:

DATA unsortedfour;
    set one two;
RUN;
 
PROC PRINT data = unsortedfour NOOBS;
    title 'The unsortedfour data set';
RUN;
 
PROC SORT data = unsortedfour out = four;
    by year;
RUN;
 
PROC PRINT data = four NOOBS;
    title '
The four data set';
RUN;

The four data set

year

x

2000

1

2001

2

2001

5

2002

3

2002

6

2003

4

2003

7

2004

8

Launch and run  the SAS program, and review the output from the last PRINT procedure to convince yourself that the four data set is identical to the interleaved three data set.