Okay, now let's spend some time on a subject that is a little more fitting for the title of this lesson. In the previous lesson, we investigated how to use column input to read data values that appear in neatly defined columns looking like this, say:
Smith 8145551354 3.89
Washington 8145569847 2.73
Wing 8145359376 3.56
Jackson 8145557437 3.12
Here, we'll investigate how to use list input to read in free-format data looking like this, say:
Smith 8145551354 3.89
Washington 8145569847 2.73
Wing 8145359376 3.56
Jackson 8145557437 3.12
List input might be the easiest input style to use because, as shown in the examples that follow, you simply list the variable names in the same order as the corresponding raw data fields. The ease comes with a bit of a price, however. Because you do not have to tell SAS the columns in which the data values appear, you must take note of the following restrictions:
- Fields must be separated by at least one blank (or other delimiters).
- Fields must be read in order from left to right.
- You cannot skip or re-read fields.
- Missing values must be represented by a placeholder such as a period. (A blank field causes the matching of variable names and values to get out of sync.)
- Character values can't contain embedded blanks.
- The default length of character values is 8 bytes. A longer value is truncated when it is written to the data set NOTE! 1 byte = 1 character
- Data must be in standard character or numeric format.
Example 3.4 Section
The following SAS program illustrates the simplest example of list input. Note that there is one blank space between each of the data values. Also note that although the data values need not be lined up in columns, we still recommend doing so because of the difficulty otherwise in "eyeing" the data quickly.
DATA temp;
input subj name $ gender height weight;
* The $ that follows name tells SAS that it is
a character variable;
* By default, name only allows up to 8 characters
to be read in;
CARDS;
1024 Alice 1 65 125
1167 Maryann 1 68 140
1168 Thomas 2 68 190
1201 Benny 2 72 190
1302 Felicia 1 63 115
;
RUN;
PROC PRINT data=temp;
title 'Output dataset: TEMP';
RUN;
Obs | subj | name | gender | height | weight |
---|---|---|---|---|---|
1 | 1024 | Alice | 1 | 65 | 125 |
2 | 1167 | Maryann | 1 | 68 | 140 |
3 | 1168 | Thomas | 2 | 68 | 190 |
4 | 1201 | Benny | 2 | 72 | 190 |
5 | 1302 | Felicia | 1 | 63 | 115 |
The INPUT statement is how you tell SAS to read in the data using list input. For list input, simply list the variable names — leaving at least one space between names — in the order in which the variables appear in the data file. Remember to use the dollar sign ($) to distinguish character variables from numeric variables.
Launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are read in properly.
Example 3.5 Section
The following SAS program illustrates the necessary use of the missing value (.) placeholder when a data value is missing:
DATA temp;
input subj name $ gender height weight;
CARDS;
1024 Alice 1 65 125
1167 Maryann 1 68 140
1168 Thomas 2 68 190
1201 Benny 2 . 190
1302 Felicia 1 63 115
;
RUN;
PROC PRINT data=temp;
title 'Output dataset: TEMP';
RUN;
Obs | subj | name | gender | height | weight |
---|---|---|---|---|---|
1 | 1024 | Alice | 1 | 65 | 125 |
2 | 1167 | Maryann | 1 | 68 | 140 |
3 | 1168 | Thomas | 2 | 68 | 190 |
4 | 1201 | Benny | 2 | . | 190 |
5 | 1302 | Felicia | 1 | 63 | 115 |
Note that Benny's height is missing. Therefore, since we are using the list input style to read in the data, we have to put in a missing value (.) placeholder.
First, launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are read in properly. Then, edit the program by deleting the missing value (.) placeholder. Rerun the SAS program to see what happens when you fail to account for the missing value. In the log file, you should see a note that says:
NOTE: SAS went to a new line when INPUT statement reached past the end of a line
And the resulting output:
Obs | subj | name | gender | height | weight |
---|---|---|---|---|---|
1 | 1024 | Alice | 1 | 65 | 125 |
2 | 1167 | Maryann | 1 | 68 | 140 |
3 | 1168 | Thomas | 2 | 68 | 190 |
4 | 1201 | Benny | 2 | 190 | 1302 |
should indicate that something has clearly gone awry. What is going on here is that, by default, SAS goes to the next data line to find more data if there are more variable names in the INPUT statement then there are values in the data line. In this case, Benny's height becomes 190, the first number to appear in the data line after gender, and Benny's weight becomes 1302, the first number to appear in the next data line.
Example 3.6 Section
The following SAS program illustrates how a character variable is, by default, truncated if it contains more than 8 characters. The name 'Benedictine' is saved in the variable name as 'Benedict'.
DATA temp;
input subj name $ gender height weight;
CARDS;
1024 Alice 1 65 125
1167 Maryann 1 68 140
1168 Thomas 2 68 190
1201 Benedictine 2 68 190
1302 Felicia 1 63 115
;
RUN;
PROC PRINT data=temp;
title 'Output dataset: TEMP';
RUN;
Obs | subj | name | gender | height | weight |
---|---|---|---|---|---|
1 | 1024 | Alice | 1 | 65 | 125 |
2 | 1167 | Maryann | 1 | 68 | 140 |
3 | 1168 | Thomas | 2 | 68 | 190 |
4 | 1201 | Benny | 2 | 72 | 190 |
5 | 1302 | Felicia | 1 | 63 | 115 |
Launch and run the SAS program. Review the output from the print procedure to convince yourself that the name 'Benedictine' is indeed truncated to 'Benedict'. Incidentally, it is possible to use a LENGTH statement to tell SAS to allow the character variable name to contain more than eight characters. We'll learn about the LENGTH statement later.
Example 3.7 Section
The following SAS program illustrates how you can use the DELIMITER option of the INFILE statement to use values separators other than blanks. This example, in particular, illustrates it for the commonly used comma (,) as a delimiter:
DATA temp;
infile cards delimiter=',';
input subj name $ gender height weight;
CARDS;
1024,Alice,1,65,125
1167,Maryann,1,68,140
1168,Thomas,2,68,190
1201,Benny,2,.,190
1302,Felicia,1,63,115
;
RUN;
PROC PRINT data=temp;
title 'Output dataset: TEMP';
RUN;
Obs | subj | name | gender | height | weight |
---|---|---|---|---|---|
1 | 1024 | Alice | 1 | 65 | 125 |
2 | 1167 | Maryann | 1 | 68 | 140 |
3 | 1168 | Thomas | 2 | 68 | 190 |
4 | 1201 | Benny | 2 | . | 190 |
5 | 1302 | Felicia | 1 | 63 | 115 |
By default, SAS assumes data are space-delimited. The DELIMITER option of the INFILE statement here instead warns SAS that the data are comma-delimited — that is, that commas separate the data values rather than blank spaces. You might also have noted that although the INFILE statement typically directs SAS to externally stored data, here the CARDS option included in the INFILE statement alerts SAS that the data are actually included in the code. Launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are indeed read in properly.
Now, let's go investigate another style of input, namely formatted input.