2.1 - Reading Instream Data
2.1 - Reading Instream DataAlthough the title of this section is reading instream data, it is hard to focus on just one method of reading data into SAS. As discussed in the introduction to this lesson, every time we read data into a SAS data set, we need to tell SAS three things — where our data reside, the form of the data, and the kind of SAS data set that we want to create. Let's jump right in and take a look at an example.
Example 2.1
The following SAS program illustrates how to create a temporary SAS data set called temp1 to read instream data using column input:
DATA temp1;
input subj 1-4 gender 6 height 8-9 weight 11-13;
DATALINES;
1024 1 65 125
1167 1 68 140
1168 2 68 190
1201 2 72 190
1302 1 63 115
;
RUN;
PROC PRINT data=temp1;
title 'Output dataset: TEMP1';
RUN;
Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
DATA temp1; *A one-part name indicates a temporary data set;
input subj 1-4 gender 6 height 8-9 weight 11-13; *Identifies the columns each variable is read from;
DATALINES; *Identifies the raw data. This must come last in the data step;
1024 1 65 125
1167 1 68 140
1168 2 68 190
1201 2 72 190
1302 1 63 115
; *Identifies the end of the data. Anything between here and the RUN statement will not be executed;
RUN; *Execute the previous statements;
PROC PRINT data=temp1; *Print the entire data set;
title 'Output dataset: TEMP1'; *Title will appear at the top;
RUN;
It would probably be most helpful to you if you started by inspecting the SAS code. The key things to note about the program are:
- The DATALINES statement is the statement that you must use to tell SAS to expect instream data. The DATALINES statement:
- Must be the last statement to appear in the DATA step (that is, except for the RUN statement)
- Must immediately precede the data lines
- Must be closed by a null statement (that is, a single semicolon). Only one DATALINES statement can appear in a DATA step
The INPUT statement is the statement that you must use to tell SAS the form of the data. Here, we use what is called column input, because the data values are:
Recall that standard numeric data values can contain only numbers, decimal points, numbers in scientific notation (e.g., 3.1E5), and plus or minus signs.
In general, for each field of raw data that you want to read into your SAS data set, you must specify the following information in the INPUT statement:
If you intend for the variable to be a character variable, place one blank space and then a dollar sign ($) right after the variable's name in the INPUT statement. None of the variables in our data set are character variables, and therefore no dollar signs appear in the INPUT statement in our program. As our INPUT statement informs SAS, the subject number (subj) begins in column 1 and ends in column 4, gender occupies just column 6, the subject's height begins in column 8 and ends in column 9, and the subject's weight begins in column 11 and ends in column 13. You might want to count the columns out from left to right to convince yourself that we've defined the fields correctly.
- standard character or numeric values, and
- arranged in neatly defined columns.
- a valid SAS variable name,
- a type (character or numeric),
- and the number of the column in which the field starts and the number of the column in which the field ends, separated by a dash (-).
- The DATA statement is the statement that you must use to tell SAS whether the data set that you intend to create should be temporary or permanent. We'll learn more about temporary and permanent data sets in the lesson pages that follow. Know for now that the above DATA statement tells SAS to create a temporary data set called temp1. The DATA statement tells SAS that temp1 should be treated as temporary by specifying what is called a one-level name, such as temp1, rather than a two-level name, such as stat480.temp1. Okay, best to stop there. The key thing for now is to know that, because temp1 is a temporary data set, it exists only until the end of your current SAS session. That is, once you close out your SAS session, the SAS data set is removed from memory and would have to be created again if you needed to use it again.
Okay, enough explaining! Let's go ahead and have you launch and run the SAS program. Then, as always, view the log window first to see if SAS displays any errors from running the code. Then, view the output window. You should see a display of the data set that arises from the PRINT procedure in our code:
Output dataset: TEMP1
Obs | Subj | gender | height | weight |
---|---|---|---|---|
1 | 1024 | 1 | 65 | 125 |
2 | 1167 | 1 | 68 | 140 |
3 | 1168 | 2 | 68 | 190 |
4 | 1201 | 2 | 72 | 190 |
5 | 1302 | 1 | 63 | 115 |
Note that the CARDS statement is an alias for the DATALINES statement. That is, we could have alternatively entered the data by replacing the "DATALINES;" statement with a "CARDS;" statement. In your program editor, replace "DATALINES;" with "CARDS;" and rerun your program to convince yourself that this is indeed true.
One more thing ... if any of your data values contain semicolons, the DATALINES statement will not work. Instead, you must replace the DATALINES statement with a DATALINES4 statement, and the null statement with a single semicolon (;) with a null statement containing four semicolons (;;;;). Strange, I know.