Lesson 20: More on Importing Data -- Part I

Overview Section

In STAT 480, we learned how to read only the most basic data files into a SAS data set. In this lesson (and the next), we'll extend our knowledge in this area by learning how to read just about any data file into SAS — no matter how messy or unstructured the input data file is. In most cases, the data files will be raw ASCII data files that are obtained from exporting data from some other PC software.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to do the following:

  • read raw data separated by spaces into a SAS data set (that is, use list input)
  • read raw data arranged in columns into a SAS data set (that is, use column input)
  • read raw data not in standard format into a SAS data set (that is, use formatted input)
  • mix list, column, and formatted input styles to read raw data into a SAS data set
  • determine when list input, column input, formatted input, or some combination of the three styles should be used to input a raw data file
  • understand that the lengths of numeric variables are set to 8 by default and therefore do not necessarily coincide with the widths of the numeric informats used in an INPUT statement
  • state the difference between fixed-length record data files and variable-length record data files
  • determine when it is appropriate, and how, to use the INFILE statement's PAD option
  • decide when it is appropriate, and how, to use the INFILE statement's MISSOVER option
  • determine when it is appropriate, and how, to use the INFILE statement's DLM= option
  • decide when it is appropriate, and how, to use the INFILE statement's DSD option
  • determine when it is appropriate, and how, to use the INFILE statement's FIRSTOBS= option
  • state how to read missing values when using list input
  • determine when it is appropriate, and how, to specify a range of numeric or character variables in the INPUT statement
  • utilize the LENGTH statement to modify the length of a character or numeric variable when appropriate
  • apply the ampersand (&) modifier with list input to read character values that contain embedded blanks
  • insert the colon (:) modifier with list input to read nonstandard data values and character values that are longer than eight characters, but which have no embedded blanks
  • explain why with formatted input, the informat determines both the length of character variables and the number of columns that are read
  • explain why the informat in the modified list input determines only the length of the modified variable, not the number of columns that are read