21.2 - Creating Multiple Observations From a Single Record

In the next three sections, we'll pull in a number of different tools that we've learned throughout the course — as well as add a few new ones — in order to read raw data files that contain data values for multiple observations in just one record. We'll just introduce the situations here and then investigate each more fully over the next three sections.

Reading Repeating Blocks of Data Section

First, we'll learn how to read raw data files in which each record contains a repeating block of values in which each block in the record represents a separate observation. For example, we'll learn how to read this data file:

MonthAv HighAv LowMonthAv HighAv LowMonthAv HighAv Low
Jan3216Feb3518Mar4626
Apr5837May6847Jun7856
Jul8260Aug8058Sep7251
Oct6140Nov4832Dec3722

in which each record contains three blocks of data values — the month and the average high and low temperature for that month in State College, PA. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:

MonthAvg HighAvg Low
Jan3216
Feb3518
Mar4626
Apr5837
May6847
Jun7856
Jul8260
Aug8058
Sep7251
Oct6140
Nov4832
Dec3722

Reading the Same Number of Repeating Fields Section

Then, we'll learn how to read raw data files in which each record contains an ID field followed by an equal number of repeating fields that contribute values to separate observations. For example, we'll learn how to read this data file:

idexam 1exam 2exam 3
1110002347982100
922232573878995
252359873657273
205804679929599

in which each record contains a nine-digit student ID number followed by three exam scores. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:

idexamscore
111000234179
111000234282
1110002343100
922232573187
922232573289
922232573395
252359873165
252359873272
252359873373
205804679192
205804679295
205804679399

Reading a Varying Number of Repeating Fields Section

Finally, we'll learn how to read raw data files in which each record contains an ID field followed by a varying number of repeating fields that contribute values to separate observations. For example, we'll learn how to read this raw data file:

idweight 1weight 2weight 3weight 4weight 5
1001179172169  
1002250249   
1003190196195164158
1004232224219212208
1005211208204202 

in which each record contains a four-digit subject ID number followed by the monthly weights (in pounds) of the subjects. Because some of the subjects dropped out of the diet program in which they were participating, the data file does not contain an equal number of weights in each record. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:

idweigh-inweight
10011179
10012172
10013169
10021250
10022249
10031190
10032196
10033195
10041232
10042224
10043219
10044212
10045208
10051211
10052208
10053204
10054202