# 15.2 - One-to-One Merging

15.2 - One-to-One MergingAt first glance, **one-to-one merging** appears to be the same as one-to-one reading, since it too combines two or more SAS data sets, one "to the right" of the other into a single "fat" data set. That is, just like one-to-one reading, one-to-one merging combines observations from two or more data sets into a single observation in a new data set. There is just one primary difference though — SAS continues to merge observations until it has read *all* of the observations from *all* of the data sets. For example, suppose again that our *patients* data set contains three variables: patient ID number (`ID`), gender (`Sex`), and age of the patient (`Age`):

ID | Sex | Age |
---|---|---|

1157 | F | 33 |

2395 | F | 48 |

1098 | M | 39 |

4829 | F | 24 |

3456 | M | 30 |

5920 | M | 41 |

1493 | F | 42 |

and our *scale* data set contains three variables: `ID` (number),` Height` (in inches) and `Weight` (in pounds):

ID | Height | Weight |
---|---|---|

1157 | 65 | 122 |

2395 | 64 | 130 |

1098 | 70 | 178 |

4829 | 67 | 142 |

3456 | 72 | 190 |

5920 | 71 | 188 |

Then, when we one-to-one merge the two data sets, we get a data set, called say *one2onemerge*, that looks like this:

ID | Sex | Age | Height | Weight |
---|---|---|---|---|

157 | F | 33 | 65 | 122 |

2395 | F | 48 | 64 | 130 |

1098 | M | 39 | 70 | 178 |

4829 | F | 24 | 67 | 142 |

3456 | M | 30 | 72 | 190 |

5920 | M | 41 | 71 | 188 |

1493 | F | 42 | . | . |

in which the second data set gets placed to the "right" of the first data set. Again, the observations are **combined based on their relative position** in the data set. The first observation of *patient* is combined with the first observation of *scale* to create the first observation in *one2onemerge*; the second observation of *patient* is combined with the second observation of *scale* to create the second observation in *one2onemerge*; and so on. When SAS performs a one-to-one merge, the DATA step continues to read observations until the last observation is read from the largest data set. That's why the *one2onemerge* data set has one more observation than the *one2oneread* data set. In general, the number of observations in a data set created by a one-to-one merge always equals the numbers of observations in the largest data set named for one-to-one merging.

## Example 15.4

The following program uses one-to-one merging to combine the *patients* data set with the *scale* data set:

```
DATA one2onemerge;
merge patients scale;
RUN;
PROC PRINT NOOBS;
title 'The one2onemerge data set';
RUN;
```

ID | VarA | VarB | VarC |
---|---|---|---|

40 | A1 | B4 | C1 |

50 | A2 | B5 | C2 |

The MERGE statement tells SAS to create a new data set by combining *patients* and *scale*. Launch and run * * the SAS program and review the output from the PRINT procedure to see the results of the one-to-one merge. You should see that the first observation in

*one2onemerge*contains the first observation of

*patients*and

*scale*, the second observation in

*one2onemerge*contains the second observation of

*patients*and

*scale*, and so on. Since there are seven observations in

*patients*and six observations in

*scale*, the new

*one2onemerge*data set contains seven observations, with missing values for the

`Height`and

`Weight`variables in the seventh observation. Note that although this example only combined two data sets, the MERGE statement can contain any number of input data sets.

Just as is true for one-to-one reading, if data sets that are being one-to-one merged contain variables that have the same names, the values that are read in from the last data set overwrite the values that were read in from earlier data sets. Let's go back to our contrived example to illustrate this point.

## Example 15.5

The following program uses one-to-one merging to combine the *one* data set with the *two* data set to create a new data set called *onetwomerged*:

```
DATA one;
input ID VarA $ VarB $;
DATALINES;
10 A1 B1
20 A2 B2
30 A3 B3
;
RUN;
DATA two;
input ID VarB $ VarC $;
DATALINES;
40 B4 C1
50 B5 C2
;
RUN;
DATA onetwomerged;
merge one two;
RUN;
PROC PRINT data = onetwomerged NOOBS;
title 'The onetwomerged data set';
RUN;
```

ID | VarA | VarB | VarC |
---|---|---|---|

40 | A1 | B4 | C1 |

50 | A2 | B5 | C2 |

30 | A3 | B3 |

Note again that the *one* and *two* data sets share two variables, namely `ID` and `VarB`. The third (and last) DATA step tells SAS to combine the two data sets using the one-to-one merging method. Let's walk our way through how SAS processes the DATA step. At the end of the compile phase, SAS will have created a program data vector containing the variables from the *one* and *two* data sets in the order in which they appear in the DATA step:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

1 | 0 | . | . |

During the first iteration of the DATA step, the MERGE statement reads first one observation from data set *one*:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

1 | 0 | 10 | A1 | B1 | . |

and then one observation from data set *two*. The values for `ID` and `VarB` in data set *two* overwrites the values for `ID` and `VarB` in data set *one*:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

1 | 0 | 40 | A1 | B4 | C1 |

Being at the end of the first iteration of the DATA step, SAS writes the contents of the program data vector as the first observation in the *onetwomerged* data set. Upon returning to the top of the DATA step, the program data vector looks like this at the beginning of the second iteration of the DATA step:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

2 | 0 | 40 | A1 | B4 | C1 |

Just as is true for a SET statement, SAS retains the values of variables that were read from a SAS data set with the MERGE statement. Now, the MERGE statement reads the second observation from the *one* data set:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

2 | 0 | 20 | A2 | B2 | C1 |

and the second observation from the* two* data set. Again, the values for `ID` and `VarB` in data set *two* overwrites the values for `ID` and `VarB` in data set *one*:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

2 | 0 | 50 | A2 | B5 | C2 |

Being at the end of the second iteration of the DATA step, SAS writes the contents of the program data vector as the second observation in the *onetwomerged* data set. Now this is where things get different! Because an observation still exists in the *one* data set, SAS returns to the top of the DATA step. The program data vector looks like this at the beginning of the third iteration of the DATA step:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

3 | 0 | 50 | A2 | B5 | C2 |

And, the MERGE statement reads the third observation from the *one* data set:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

3 | 0 | 30 | A3 | B3 | C2 |

SAS attempts to read a third observation from the *two* data set but instead encounters an end-of-data set marker. Therefore, *as is always the case in this kind of situation*, SAS sets the values of all of that data set's variables in the program data vector to missing:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

3 | 0 | 30 | A3 | B |

Being at the end of the third iteration of the DATA step, SAS writes the contents of the program data vector as the third observation in the *onetwomerged* SAS data set. Because there are no more observations in either the *one* or the *two* data set, processing stops. Thank goodness!

Now, launch and run * * the SAS program, and review the output to convince yourself that the

*one*and

*two*data sets are combined as described.

**One more closing comment.** One-to-one reading and one-to-one merging require users to exercise extreme caution when combining two (or more) data sets based on relative position only. It would just take one of the data sets to be "shifted" ever so slightly to get really messed up results. It's for this reason that I personally don't find the one-to-one read or the one-to-one merge all that practical. The more useful and therefore much more common merge performed in SAS is what is called **match-merging**. We'll learn about it in the next lesson.