15.1 - One-to-One Reading

One-to-one reading combines two or more SAS data sets, one "to the right" of the other into a single "fat" data set. That is, one-to-one reading combines observations from two or more data sets into a single observation in a new data set. For example, suppose the data set patients contains three variables: patient ID number (ID), gender (Sex), and age of the patient (Age):

ID	Sex	Age
1157	F	33
2395	F	48
1098	M	39
4829	F	24
3456	M	30
5920	M	41
1493	F	42

and the data set scale contains three variables: ID (number), Height (in inches) and Weight (in pounds):

ID	Height	Weight
1157	65	122
2395	64	130
1098	70	178
4829	67	142
3456	72	190
5920	71	188

Then, when we one-to-one read the two data sets, we get what I like to call a "fat" data set, called say one2oneread:

ID	Sex	Age	Height	Weight
157	F	33	65	122
2395	F	48	64	130
1098	M	39	70	178
4829	F	24	67	142
3456	M	30	72	190
5920	M	41	71	188

in which the second data set gets placed to the "right" of the first data set. Note that the observations are combined based on their relative position in the data set. The first observation of patient is combined with the first observation of scale to create the first observation in one2oneread; the second observation of patient is combined with the second observation of scale to create the second observation in one2oneread; and so on. The DATA step stops after it reads the last observation from the smallest data set. Therefore, the number of observations in the new data set always equals the numbers of observations in the smallest data set you name for one-to-one reading.

Example 15.1 Section

The following program uses one-to-one reading to combine the patients data set with the scale data set:

DATA patients;
DATA patients;
    input ID Sex $ Age;
	DATALINES;
 1157    F     33
 2395    F     48
 1098    M     39
 4829    F     24
 3456    M     30
 5920    M     41
 1493    F     42
 ;
 RUN;
 DATA scale;
     input ID Height Weight;
	 DATALINES;
 1157    65     122
 2395    64     130
 1098    70     178
 4829    67     142
 3456    72     190
 5920    71     188
 ;
 RUN;
 
 DATA one2oneread;
    set patients;
	set scale;
 RUN;
 
 PROC PRINT NOOBS;
    title 'The one2oneread data set';
 RUN;

The one2oneread data set
ID	Sex	Age	Height	Weight
1157	F	33	65	122
2395	F	48	64	130
1098	M	39	70	178
4829	F	24	67	142
3456	M	30	72	190
5920	M	41	71	188

Of course, the first two DATA steps just read in the respective patients and scale data sets. The meat of the one-to-one read takes place in the third (and last) DATA step, in which we see two SET statements. The first SET statement tells SAS first to read the contents of the patients data set into the program data vector, and then the second SET statement tells SAS to read the contents of the scale data set into the program data vector.

Launch and run the SAS program, and review the output to convince yourself that the data sets are combined as described. You should note, in particular, that SAS does indeed stop reading after reaching the last observation in the scale data set. Hence, the combined data set, one2oneread, contains six observations, the number of observations in the smallest of the two data sets. Note, too, that the position of the variables in the one2oneread data set directly corresponds to the order in which the SET statements appear in the DATA step. Because the scale data set appears to the right of the patients data set in the SET statement, the variables from the scale data set appear to the right of the variables from the patients data set in the combined one2oneread data set.

Example 15.2 Section

The following program uses one-to-one reading to combine the patients data set with the scale data set in the reverse order from that of the previous program:

DATA one2oneread2;
    set scale;
    set patients;
RUN;
 
PROC PRINT NOOBS;
   title 'The one2oneread2 data set';
RUN;

The one2oneread2 data set
ID	Height	Weight	Sex	Age
1157	65	122	F	33
2395	64	130	F	48
1098	70	178	M	39
4829	67	142	F	24
3456	72	190	M	30
5920	71	188	M	41

Note, here, that the SET statement for the scale data set appears first in the DATA step, followed by the SET statement for the patients data set. Launch and run the SAS program, and review the output to convince yourself that the variables from the patients data set appear to the right of the variables from the scale data set in the combined one2oneread2 data set.

Hmmm... but what about the fact that the ID variable appears in both the patients and scale data sets, but it appears only once in the combined one2oneread2 data set? How does SAS handle the situation? Well... in general, if the data sets contain variables that have the same names, the values that are read in from the last data set overwrite the values that were read in from earlier data sets. Let's take a look at a contrived example to illustrate this point.

Example 15.3 Section

The following program uses one-to-one reading to combine the one data set with the two data set to create a new data set called onetwo:

DATA one;
input ID VarA $ VarB $;
	DATALINES;
	10 A1 B1
	20 A2 B2
	30 A3 B3
  ;
RUN;
 
DATA two;
   input ID VarB $ VarC $;
   DATALINES;
   40 B4 C1
   50 B5 C2
   ;
RUN;
 
DATA onetwo;
   set one;
   set two;
RUN;
 
PROC PRINT data = onetwo NOOBS;
   title 'The onetwo data set';
RUN;

The onetwo data set
ID	VarA	VarB	VarC
40	A1	B4	C1
50	A2	B5	C2

As you review the first two DATA steps, in which SAS reads in the respective one and two data sets, note that the two data sets share two variables, namely ID and VarB. The third (and last) DATA step tells SAS to combine the two data sets using the one-to-one reading method. Let's walk our way through how SAS processes the DATA step. At the end of the compile phase, SAS will have created a program data vector containing the variables from the one and two data sets in the order in which they appear in the DATA step:

_N_	_ERROR_	Num	VarA	VarB	VarC
1	0	.

During the first iteration of the DATA step, the first SET statement reads one observation from data set one:

_N_	_ERROR_	Num	VarA	VarB	VarC
1	0	10	A1	B1

Then, the second SET statement reads one observation from data set two. The values for ID and VarB in data set two overwrite the values for ID and VarB in data set one:

_N_	_ERROR_	Num	VarA	VarB	VarC
1	0	40	A1	B4	C1

Being at the end of the first iteration of the DATA step, SAS writes the contents of the program data vector as the first observation in the onetwo SAS data set. Upon returning to the top of the DATA step, the program data vector looks like this at the beginning of the second iteration of the DATA step:

_N_	_ERROR_	Num	VarA	VarB	VarC
2	0	40	A1	B4	C1

Recall that SAS retains the values of variables that were read from a SAS data set with the SET statement. Now, the first SET statement reads the second observation from the one data set:

_N_	_ERROR_	Num	VarA	VarB	VarC
2	0	20	A2	B2	C1

And, the second SET statement reads the second observation from data set two. Again, the values for ID and VarB in data set two overwrite the values for ID and VarB in data set one:

_N_	_ERROR_	Num	VarA	VarB	VarC
2	0	50	A2	B5	C2

Being at the end of the second iteration of the DATA step, SAS writes the contents of the program data vector as the second observation in the onetwo SAS data set. Because there are no more observations in the two data set, processing stops. That is, the DATA step does not read the third observation from the one data set.

Now, launch and run the SAS program, and review the output to convince yourself that the one and two data sets are combined as described.

One more comment. Note that although each of our one-to-one reading examples involved combining just two data sets, you can specify any number of SET statements when one-to-one reading ... and therefore the sky's the limit.