1.3 - SAS Data Sets

In order to be able to analyze our data, we need to be able to read it into a data set that our SAS software understands. A SAS data set is a file containing two parts: a descriptor portion and a data portion.

The descriptor portion of a SAS data set contains the vital statistics of the data set, such as the name of the data set, the date and time that the data set was created, the number of observations and the number of variables. The following table shows one part of the descriptor portion of a data set called work.grade:

Data Set Name WORK.GRADE Observations 6
Member Type DATA Variables 5
Engine V9 Indexes 0
Created Monday, August 18, 2008 07:25:39 PM Observation Length 40
Last Modified Monday, August 18, 2008 07:25:39 PM Deleted Observations 0
Protection   Compressed NO
Data Set Type   Sorted NO
Label      
Data Representation WINDOWS_32    
Encoding wlatin1 Western (Windows)    

The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table, such as this:

ID Name Height Weight

53

Susie 65 120
54 Charles 72 200
55   60 105
56 Lucy 63 142
57 Dennis 70 .

In this example, the number 53 is a data value, the name Susie is a data value, and so on. Just as is true for data sets in other statistical packages, a SAS data set is comprised of variables and observations. The variables (or columns) are collections of data values that describe a particular characteristic of the thing being measured. A SAS data set can store thousands of variables. Our data set here contains just four variables — the id, name, height, and weight of the person being measured. The observations (or rows) are collections of data values that typically relate to one particular object (such as a person named Susie). The values 53, Susie, 65", and 120 constitute a single observation in the above data set. A SAS data set can store any number of observations. Our data set here contains just five observations.