# 15.1 - One-to-One Reading

15.1 - One-to-One Reading**One-to-one reading** combines two or more SAS data sets, one "to the right" of the other into a single "fat" data set. That is, one-to-one reading combines observations from two or more data sets into a single observation in a new data set. For example, suppose the data set `patients` contains three variables: patient ID number (`ID`), gender (`Sex`), and age of the patient (`Age`):

ID | Sex | Age |
---|---|---|

1157 | F | 33 |

2395 | F | 48 |

1098 | M | 39 |

4829 | F | 24 |

3456 | M | 30 |

5920 | M | 41 |

1493 | F | 42 |

and the data set *scale* contains three variables: `ID` (number),` Height` (in inches) and `Weight` (in pounds):

ID | Height | Weight |
---|---|---|

1157 | 65 | 122 |

2395 | 64 | 130 |

1098 | 70 | 178 |

4829 | 67 | 142 |

3456 | 72 | 190 |

5920 | 71 | 188 |

Then, when we one-to-one read the two data sets, we get what I like to call a "fat" data set, called say *one2oneread*:

ID | Sex | Age | Height | Weight |
---|---|---|---|---|

157 | F | 33 | 65 | 122 |

2395 | F | 48 | 64 | 130 |

1098 | M | 39 | 70 | 178 |

4829 | F | 24 | 67 | 142 |

3456 | M | 30 | 72 | 190 |

5920 | M | 41 | 71 | 188 |

in which the second data set gets placed to the "right" of the first data set. Note that the observations are **combined based on their relative position** in the data set. The first observation of *patient* is combined with the first observation of *scale* to create the first observation in *one2oneread*; the second observation of *patient* is combined with the second observation of *scale* to create the second observation in *one2oneread*; and so on. The DATA step stops after it reads the last observation from the smallest data set. Therefore, the number of observations in the new data set always equals the numbers of observations in the smallest data set you name for one-to-one reading.

## Example 15.1

The following program uses one-to-one reading to combine the *patients* data set with the *scale* data set:

```
DATA patients;
input ID Sex $ Age;
DATALINES;
1157 F 33
2395 F 48
1098 M 39
4829 F 24
3456 M 30
5920 M 41
1493 F 42
;
RUN;
DATA scale;
input ID Height Weight;
DATALINES;
1157 65 122
2395 64 130
1098 70 178
4829 67 142
3456 72 190
5920 71 188
;
RUN;
DATA one2oneread;
set patients;
set scale;
RUN;
PROC PRINT NOOBS;
title 'The one2oneread data set';
RUN;
```

Of course, the first two DATA steps just read in the respective *patients* and *scale* data sets. The meat of the one-to-one read takes place in the third (and last) DATA step, in which we see two SET statements. The first SET statement tells SAS first to read the contents of the *patients* data set into the program data vector, and then the second SET statement tells SAS to read the contents of the *scale* data set into the program data vector.

Launch and run * * the SAS program, and review the output to convince yourself that the data sets are combined as described. You should note, in particular, that SAS does indeed stop reading after reaching the last observation in the *scale* data set. Hence, the combined data set, *one2oneread*, contains six observations, the number of observations in the smallest of the two data sets. Note, too, that the position of the variables in the *one2oneread* data set directly corresponds to the order in which the SET statements appear in the DATA step. Because the *scale* data set appears to the right of the *patients* data set in the SET statement, the variables from the *scale* data set appear to the right of the variables from the *patients* data set in the combined *one2oneread* data set.

## Example 15.2

The following program uses one-to-one reading to combine the *patients* data set with the *scale* data set in the reverse order from that of the previous program:

```
DATA one2oneread2;
set scale;
set patients;
RUN;
PROC PRINT NOOBS;
title 'The one2oneread2 data set';
RUN;
```

Note, here, that the SET statement for the *scale* data set appears first in the DATA step, followed by the SET statement for the *patients* data set. Launch and run * * the SAS program, and review the output to convince yourself that the variables from the *patients* data set appear to the right of the variables from the *scale* data set in the combined *one2oneread2* data set.

Hmmm... but what about the fact that the `ID` variable appears in both the *patients* and *scale* data sets, but it appears only once in the combined *one2oneread2* data set? How does SAS handle the situation? Well... in general, if the data sets contain variables that have the same names, the values that are read in from the last data set overwrite the values that were read in from earlier data sets. Let's take a look at a contrived example to illustrate this point.

## Example 15.3

The following program uses one-to-one reading to combine the `one` data set with the `two` data set to create a new data set called `onetwo`:

```
DATA one;
input ID VarA $ VarB $;
DATALINES;
10 A1 B1
20 A2 B2
30 A3 B3
;
RUN;
DATA two;
input ID VarB $ VarC $;
DATALINES;
40 B4 C1
50 B5 C2
;
RUN;
DATA onetwo;
set one;
set two;
RUN;
PROC PRINT data = onetwo NOOBS;
title 'The onetwo data set';
RUN;
```

As you review the first two DATA steps, in which SAS reads in the respective `one` and `two `data sets, note that the two data sets share two variables, namely `ID` and `VarB. `The third (and last) DATA step tells SAS to combine the two data sets using the one-to-one reading method. Let's walk our way through how SAS processes the DATA step. At the end of the compile phase, SAS will have created a program data vector containing the variables from the one and two data sets in the order in which they appear in the DATA step:` `

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

1 | 0 | . |

During the first iteration of the DATA step, the first SET statement reads one observation from data set `one`:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

1 | 0 | 10 | A1 | B1 |

Then, the second SET statement reads one observation from data set `two`. The values for `ID` and `VarB` in data set `two` overwrites the values for `ID and VarB in data set one: `

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

1 | 0 | 40 | A1 | B4 | C1 |

Being at the end of the first iteration of the DATA step, SAS writes the contents of the program data vector as the first observation in the `onetwo` SAS data set. Upon returning to the top of the DATA step, the program data vector looks like this at the beginning of the second iteration of the DATA step:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

2 | 0 | 40 | A1 | B4 | C1 |

Recall that SAS retains the values of variables that were read from a SAS data set with the SET statement. Now, the first SET statement reads the second observation from the `one` data set:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

2 | 0 | 20 | A2 | B2 | C1 |

And, the second SET statement reads the second observation from data set `two`. Again, the values for `ID` and `VarB` in data set `two` overwrites the values for `ID` and `VarB` in data set `one`:

_N_ | _ERROR_ | Num | VarA | VarB | VarC |
---|---|---|---|---|---|

2 | 0 | 50 | A2 | B5 | C2 |

Being at the end of the second iteration of the DATA step, SAS writes the contents of the program data vector as the second observation in the `onetwo` SAS data set. Because there are no more observations in the `two` data set, processing stops. That is, the DATA step does not read the third observation from the `one` data set.

Now, launch and run * * the SAS program, and review the output to convince yourself that the `one` and *two* data sets are combined as described.

**One more comment**. Note that although each of our one-to-one reading examples involved combining just two data sets, you can specify any number of SET statements when one-to-one reading ... and therefore the sky's the limit.