21.2 - Creating Multiple Observations From a Single Record

In the next three sections, we'll pull in a number of different tools that we've learned throughout the course — as well as add a few new ones — in order to read raw data files that contain data values for multiple observations in just one record. We'll just introduce the situations here, and then investigate each more fully over the next three sections.

Reading Repeating Blocks of Data Section

First, we'll learn how to read raw data files in which each record contains a repeating block of values in which each block in the record represents a separate observation. For example, we'll learn how to read this data file:

Jan 32 16 Feb 35 18 Mar 46 26
Apr 58 37 May 68 47 Jun 78 56
Jul 82 60 Aug 80 58 Sep 72 51
Oct 61 40 Nov 48 32 Dec 37 22

in which each record contains three blocks of data values — the month and the average high and low temperature for that month in State College, PA. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:

Month Avg High Avg Low
Jan 32 16
Feb 35 18
Mar 46 26
Apr 58 37
May 68 47
Jun 78 56
Jul 82 60
Aug 80 58
Sep 72 51
Oct 61 40
Nov 48 32
Dec 37 22

Reading the Same Number of Repeating Fields Section

Then, we'll learn how to read raw data files in which each record contains an ID field followed by an equal number of repeating fields that contribute values to separate observations. For example, we'll learn how to read this data file:

111000234 79 82 100
922232573 87 89 95
252359873 65 72 73
205804679 92 95 99

in which each record contains a nine-digit student ID number followed by three exam scores. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:

Month Avg High Avg Low
Jan 32 16
Feb 35 18
Mar 46 26
Apr 58 37
May 68 47
Jun 78 56
Jul 82 60
Aug 80 58
Sep 72 51
Oct 61 40
Nov 48 32
Dec 37 22

Reading a Varying Number of Repeating Fields Section

Finally, we'll learn how to read raw data files in which each record contains an ID field followed by a varying number of repeating fields that contribute values to separate observations. For example, we'll learn how to read this raw data file:

1001 179 172 169    
1002 250 249      
1003 190 196 195 164 158
1004 232 224 219 212 208
1005 211 208 204 202  

in which each record contains a four-digit subject ID number followed by monthly weights (in pounds) of the subjects. Because some of the subjects dropped out of the diet program in which they were participating, the data file does not contain an equal number of weights in each record. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:

id weighin weight
1001 1 179
1001 2 172
1001 3 169
1002 1 250
1002 2 249
1003 1 190
1003 2 196
1003 3 195
1004 1 232
1004 2 224
1004 3 219
1004 4 212
1004 5 208
1005 1 211
1005 2 208
1005 3 204
1005 4 202