In order to be able to analyze our data, we need to be able to read it into a data set that our SAS software understands. A SAS data set is a file containing two parts: a descriptor portion and a data portion.
The descriptor portion of a SAS data set contains the vital statistics of the data set, such as the name of the data set, the date and time that the data set was created, the number of observations, and the number of variables. The following table shows one part of the descriptor portion of a data set called work.grade:
Data Set Name | WORK.GRADE | Observations | 6 |
---|---|---|---|
Member Type | DATA | Variables | 5 |
Engine | V9 | Indexes | 0 |
Created | Monday, August 18, 2021 07:25:39 PM | Observation Length | 40 |
Last Modified | Monday, August 18, 2021 07:25:39 PM | Deleted Observations | 0 |
Protection | Compressed | NO | |
Data Set Type | Sorted | NO | |
Label | |||
Data Representation | WINDOWS_32 | ||
Encoding | wlatin1 Western (Windows) |
The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table, such as this:
ID | Name | Height | Weight |
---|---|---|---|
53 | Susie | 65 | 120 |
54 | Charles | 72 | 200 |
55 | 60 | 105 | |
56 | Lucy | 63 | 142 |
57 | Dennis | 70 | . |
In this example, the number 53 is a data value, the name Susie is a data value, and so on. Just as is true for data sets in other statistical packages, a SAS data set is comprised of variables and observations. The variables (or columns) are collections of data values that describe a particular characteristic of the thing being measured. A SAS data set can store thousands of variables. Our data set here contains just four variables — the id, name, height, and weight of the person being measured. The observations (or rows) are collections of data values that typically relate to one particular object (such as a person named Susie). The values 53, Susie, 65", and 120 constitute a single observation in the above data set. A SAS data set can store any number of observations. Our data set here contains just five observations.