1.3 - SAS Data Sets

In order to be able to analyze our data, we need to be able to read it into a data set that our SAS software understands. A SAS data set is a file containing two parts: a descriptor portion and a data portion.

The descriptor portion of a SAS data set contains the vital statistics of the data set, such as the name of the data set, the date and time that the data set was created, the number of observations, and the number of variables. The following table shows one part of the descriptor portion of a data set called work.grade:

Data Set Name	WORK.GRADE	Observations	6
Member Type	DATA	Variables	5
Engine	V9	Indexes	0
Created	Monday, August 18, 2021 07:25:39 PM	Observation Length	40
Last Modified	Monday, August 18, 2021 07:25:39 PM	Deleted Observations	0
Protection		Compressed	NO
Data Set Type		Sorted	NO
Label
Data Representation	WINDOWS_32
Encoding	wlatin1 Western (Windows)

The data portion of a SAS data set is a collection of data values that are arranged in a rectangular table, such as this:

ID	Name	Height	Weight
53	Susie	65	120
54	Charles	72	200
55		60	105
56	Lucy	63	142
57	Dennis	70	.

In this example, the number 53 is a data value, the name Susie is a data value, and so on. Just as is true for data sets in other statistical packages, a SAS data set is comprised of variables and observations. The variables (or columns) are collections of data values that describe a particular characteristic of the thing being measured. A SAS data set can store thousands of variables. Our data set here contains just four variables — the id, name, height, and weight of the person being measured. The observations (or rows) are collections of data values that typically relate to one particular object (such as a person named Susie). The values 53, Susie, 65", and 120 constitute a single observation in the above data set. A SAS data set can store any number of observations. Our data set here contains just five observations.