In the next three sections, we'll pull in a number of different tools that we've learned throughout the course — as well as add a few new ones — in order to read raw data files that contain data values for multiple observations in just one record. We'll just introduce the situations here and then investigate each more fully over the next three sections.
Reading Repeating Blocks of Data Section
First, we'll learn how to read raw data files in which each record contains a repeating block of values in which each block in the record represents a separate observation. For example, we'll learn how to read this data file:
Month | Av High | Av Low | Month | Av High | Av Low | Month | Av High | Av Low |
---|---|---|---|---|---|---|---|---|
Jan | 32 | 16 | Feb | 35 | 18 | Mar | 46 | 26 |
Apr | 58 | 37 | May | 68 | 47 | Jun | 78 | 56 |
Jul | 82 | 60 | Aug | 80 | 58 | Sep | 72 | 51 |
Oct | 61 | 40 | Nov | 48 | 32 | Dec | 37 | 22 |
in which each record contains three blocks of data values — the month and the average high and low temperature for that month in State College, PA. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:
Month | Avg High | Avg Low |
---|---|---|
Jan | 32 | 16 |
Feb | 35 | 18 |
Mar | 46 | 26 |
Apr | 58 | 37 |
May | 68 | 47 |
Jun | 78 | 56 |
Jul | 82 | 60 |
Aug | 80 | 58 |
Sep | 72 | 51 |
Oct | 61 | 40 |
Nov | 48 | 32 |
Dec | 37 | 22 |
Reading the Same Number of Repeating Fields Section
Then, we'll learn how to read raw data files in which each record contains an ID field followed by an equal number of repeating fields that contribute values to separate observations. For example, we'll learn how to read this data file:
id | exam 1 | exam 2 | exam 3 |
---|---|---|---|
111000234 | 79 | 82 | 100 |
922232573 | 87 | 89 | 95 |
252359873 | 65 | 72 | 73 |
205804679 | 92 | 95 | 99 |
in which each record contains a nine-digit student ID number followed by three exam scores. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:
id | exam | score |
---|---|---|
111000234 | 1 | 79 |
111000234 | 2 | 82 |
111000234 | 3 | 100 |
922232573 | 1 | 87 |
922232573 | 2 | 89 |
922232573 | 3 | 95 |
252359873 | 1 | 65 |
252359873 | 2 | 72 |
252359873 | 3 | 73 |
205804679 | 1 | 92 |
205804679 | 2 | 95 |
205804679 | 3 | 99 |
Reading a Varying Number of Repeating Fields Section
Finally, we'll learn how to read raw data files in which each record contains an ID field followed by a varying number of repeating fields that contribute values to separate observations. For example, we'll learn how to read this raw data file:
id | weight 1 | weight 2 | weight 3 | weight 4 | weight 5 |
---|---|---|---|---|---|
1001 | 179 | 172 | 169 | ||
1002 | 250 | 249 | |||
1003 | 190 | 196 | 195 | 164 | 158 |
1004 | 232 | 224 | 219 | 212 | 208 |
1005 | 211 | 208 | 204 | 202 |
in which each record contains a four-digit subject ID number followed by the monthly weights (in pounds) of the subjects. Because some of the subjects dropped out of the diet program in which they were participating, the data file does not contain an equal number of weights in each record. When all is said and done, we will have transformed the input raw data file into a SAS data set that looks like this:
id | weigh-in | weight |
---|---|---|
1001 | 1 | 179 |
1001 | 2 | 172 |
1001 | 3 | 169 |
1002 | 1 | 250 |
1002 | 2 | 249 |
1003 | 1 | 190 |
1003 | 2 | 196 |
1003 | 3 | 195 |
1004 | 1 | 232 |
1004 | 2 | 224 |
1004 | 3 | 219 |
1004 | 4 | 212 |
1004 | 5 | 208 |
1005 | 1 | 211 |
1005 | 2 | 208 |
1005 | 3 | 204 |
1005 | 4 | 202 |