1.3 - SAS Data Sets

In order to be able to analyze our data, we need to be able to read it into a data set that our SAS software understands. A SAS data set is a file containing two parts: a descriptor portion and a data portion.

The descriptor portion of a SAS data set contains the vital statistics of the data set, such as the name of the data set, the date and time that the data set was created, the number of observations, and the number of variables. The following table shows one part of the descriptor portion of a data set called work.grade:

Data Set Name

WORK.GRADE

Observations

6

Member Type

DATA

Variables

5

Engine

V9

Indexes

0

Created

Monday, August 18, 2021 07:25:39 PM

Observation Length

40

Last Modified

Monday, August 18, 2021 07:25:39 PM

Deleted Observations

0

Protection

 

Compressed

NO

Data Set Type

 

Sorted

NO

Label

   

Data Representation

WINDOWS_32

  

Encoding

wlatin1 Western (Windows)

  

The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table, such as this:

ID

Name

Height

Weight

53

Susie

65

120

54

Charles

72

200

55

 

60

105

56

Lucy

63

142

57

Dennis

70

.

In this example, the number 53 is a data value, the name Susie is a data value, and so on. Just as is true for data sets in other statistical packages, a SAS data set is comprised of variables and observations. The variables (or columns) are collections of data values that describe a particular characteristic of the thing being measured. A SAS data set can store thousands of variables. Our data set here contains just four variables — the id, name, height, and weight of the person being measured. The observations (or rows) are collections of data values that typically relate to one particular object (such as a person named Susie). The values 53, Susie, 65", and 120 constitute a single observation in the above data set. A SAS data set can store any number of observations. Our data set here contains just five observations.