As mentioned in the introduction to this lesson, there are three different styles of input that are available to us in SAS. They are:
- column input, which is the most commonly used style, allows you to read data values that are entered in fixed columns.
- list input, which allows you to read data by simply listing the variable names in the INPUT statement. At least one space (or character) must occur between each value in the data set.
- formatted input, which allows you to read numeric data containing special characters, such as dates and dollar amounts.
In this section, we will take a look at two simple examples of column input. In the next lesson, we will spend some time investigating list input and formatted input.
A couple of comments. For the sake of the examples that follow, we'll use the DATALINES statement to read in data. We could have just as easily used the INFILE statement to illustrate each point. Additionally, we'll create temporary data sets rather than permanent ones, even though we could have just as easily created permanent data sets to illustrate each point. Finally, after each SAS DATA step, we'll use the SAS print procedure (PROC PRINT) to print the resulting SAS data set for your perusal.
Column input Section
Column input allows you to read variable values that occupy the same columns within each record. To use column input, list the variable names in the INPUT statement, immediately following each variable name with its corresponding column positions in each of the data lines. (Of course, you'll need to follow each character variable with a dollar sign ($) first.) Column input can be used whenever your raw data are in fixed columns and in standard character or numeric format. Column input reads data values until it reaches the last specified column for the field.
The important points to note about column input are:
- When using column input, you are not required to indicate missing values with a placeholder, such as a period. That is, missing values can be left blank.
- Column input uses the columns specified to determine the length of character variables, thereby allowing the character values to exceed the default 8 characters and to have embedded spaces.
- Column input allows fields to be skipped altogether or to be read in any order.
- Column input allows only part of a value to be read and allows values to be re-read.
- Spaces are not required between the data values.
Example 2.5 Section
The following SAS program illustrates the simplest example of column input.
DATA temp;
input subj 1-4 name $ 6-23 gender 25 height 27-28 weight 30-32;
CARDS;
1024 Alice Smith 1 65 125
1167 Maryann White 1 68 140
1168 Thomas Jones 2 68 190
1201 Benedictine Arnold 2 68 190
1302 Felicia Ho 1 63 115
;
RUN;
PROC PRINT data=temp;
title 'Output dataset: TEMP';
RUN;
Obs | subj | name | gender | height | weight |
---|---|---|---|---|---|
1 | 1024 | Alice Smith | 1 | 65 | 125 |
2 | 1167 | Maryann White | 1 | 68 | 140 |
3 | 1168 | Thomas Jones | 2 | 68 | 190 |
4 | 1201 | Benedictine Arnold | 2 | 68 | 190 |
5 | 1302 | Felicia Ho | 1 | 63 | 115 |
First, inspect the SAS code to make sure you understand how to set up the INPUT statement for column input.
Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
DATA temp;
input subj 1-4 name $ 6-23 gender 25 height 27-28 weight 30-32; *The numbers specify the columns to read
from for each variable; CARDS; *The data are lined up in columns, starting with column one at the left
margin of the page. It is not necessary that they be separated by spaces;
1024 Alice Smith 1 65 125
1167 Maryann White 1 68 140
1168 Thomas Jones 2 68 190
1201 Benedictine Arnold 2 68 190
1302 Felicia Ho 1 63 115
;
RUN;
PROC PRINT data=temp;
title 'Output dataset: TEMP';
RUN;
Then, launch and run the SAS program.
Finally, review the output (click on
from the print procedure to convince yourself that the data is read in properly.Example 2.6 Section
The following SAS program illustrates some of the key features of column input:
DATA temp;
input init $ 6 f_name $ 6-16 l_name $ 18-23
weight 30-32 height 27-28;
CARDS;
1024 Alice Smith 1 65 125
1167 Maryann White 1 68 140
1168 Thomas Jones 2 190
1201 Benedictine Arnold 2 68 190
1302 Felicia Ho 1 63 115
;
RUN;
PROC PRINT data=temp;
title 'Output dataset: TEMP';
RUN;
Obs | init | f_name | l_name | weight | height |
---|---|---|---|---|---|
1 | A | Alice | Smith | 125 | 65 |
2 | M | Maryann | White | 140 | 68 |
3 | T | Thomas | Jones | 190 | . |
4 | B | Benedictine | Arnold | 190 | 68 |
5 | F | Felicia | Ho | 115 | 63 |
Review the output (click on ) from the print procedure to convince yourself that the data are read in properly. Note that the position of the variables within the temporary data set temp corresponds to the order in which the variables appear in the input statement, not the order in which the variables appear in the data set.