Lesson 3: Reading Data into a SAS Data Set - Part II

Lesson 3: Reading Data into a SAS Data Set - Part II

Overview

As the title suggests, we'll spend some time in this lesson extending our knowledge of how to read data into a SAS data set. In the last lesson, we focused on column input. In this lesson, we'll investigate two other styles of input, namely list input and formatted input.

Along the way, we'll digress here and there to address a few topics that are best to be exposed to early on in your SAS education. In that vein, we'll learn:

  1. How to use the CONTENTS procedure to view the contents of SAS libraries
  2. How to use the SAS online Help and Documentation facilities
  3. How to use the OPTIONS statement to alter SAS system options that control the appearance of your output

Objectives

Upon completion of this lesson, you should be able to:

  • initiate the CONTENTS procedure to view the contents of a SAS library
  • initiate the CONTENTS procedure to view the descriptor portion of a SAS data set
  • read data into SAS using either list input or formatted input
  • recall that formatted input does not imply formatted output, and in so doing input the basic FORMAT statement
  • change the appearance of your output using the OPTIONS statement
  • access the electronic index in the SAS Help and Documentation that comes with the SAS software

3.1 - Viewing the Contents of SAS Libraries

3.1 - Viewing the Contents of SAS Libraries

Before investigating other methods for reading data into a SAS data set, let's digress for a few minutes to learn how to use the CONTENTS procedure to view the contents of SAS libraries.

The CONTENTS Procedure

The CONTENTS procedure allows us to create SAS output that describes either the contents of a SAS library or the descriptor information for an individual SAS data set. In order to view the contents of a SAS library, we can use the following general form of the procedure:

PROC CONTENTS data = libref._ALL_ NODS;
RUN;

where:

  • libref is the libref that you assigned to the library
  • the _ALL_ option requests a listing of all of the SAS files in the library
  • the NODS option (which stands for "no details") suppresses the printing of detailed information about each file when you specify the _ALL_ option.

NOTE! Two things:

  1. You need to connect the _ALL_ option to the libref with a period (.)
  2. You can specify the NODS option only when you specify the _ALL_ option. Let's take a look at an example

Example 3.1

The following SAS code requests a listing of the contents of the library called sashelp:

PROC CONTENTS data = sashelp._ALL_ nods;
RUN;

First, what are SAS Libraries all about? Let's have you launch your SAS application. If the Explorer Window is not active, then click on the Explorer tab SAS left hand navigation window tab for Explorer that appears among the bottom tabs.

opened Explorer window showing Libraries, File Shortcuts, Favorite Folders, and Desktop icons

Double click Libraries, which contains all available libraries. Then you should see a listing of your libraries, that is, something that looks like this:

Active libraries window showing icons for maps, sashelp, sasuser, work etc...

You should see, at least, three libraries that SAS defines for you by default:

  • As you know, Work is a temporary library for files that do not need to be saved from session to session.
  • Sashelp is a permanent library that contains more than 200 sample data and other files that control how SAS works at your site. You can download these sample data for practicing or illustrating purposes.
  • Sasuser is a permanent library that contains SAS files in the Profile catalog that store your personal settings. You may also opt to store your own files there. (Personally, I don't.)

Now, let's have you run  the SAS code. Upon doing so, you should see an output that looks something like this:

The SAS System

The CONTENTS Procedure

Directory

Libref

SASHELP

Levels

22

Level 1

Engine

V9

Physical Name

C:\Program Files\SASHome\SASFoundation\9.4\nls\1d\SASCFG

Filename

C:\Program Files\SASHome\SASFoundation\9.4\nls\1d\SASCFG

Owner Name

BUILTIN\Administrators

File Size

4KB

File Size (bytes)

4096

Level 2

Engine

V9

Physical Name

C:\Program Files\SASHome\SASFoundation\9.4\core\sashelp

Filename

C:\Program Files\SASHome\SASFoundation\9.4\core\sashelp

Owner Name

BUILTIN\Administrators

File Size

96KB

File Size (bytes)

98304

Level 3

Engine

V9

Physical Name

C:\Program Files\SASHome\SASFoundation\9.4\aacomp\sashelp

Filename

C:\Program Files\SASHome\SASFoundation\9.4\aacomp\sashelp

Owner Name

BUILTIN\Administrators

File Size

0KB

File Size (bytes)

0

Level 4

Engine

V9

Physical Name

C:\Program Files\SASHome\SASFoundation\9.4\af\sashelp

Filename

C:\Program Files\SASHome\SASFoundation\9.4\af\sashelp

Owner Name

BUILTIN\Administrators

File Size

0KB

File Size (bytes)

0

... partial listing is below ...

#

Name

Member Type

Level

File Size

Last Modified

1

AACOMP

DATA

3

384KB

2017/11/09 01:06:28

.

AACOMP

INDEX

 

161KB

2017/11/09 01:06:28

2

AARFM

DATA

2

256KB

2017/09/06 23:57:22

.

AARFM

INDEX

 

21KB

2017/09/06 23:57:22

3

AC

CATALOG

2

17KB

2017/09/07 00:00:14

4

ADSMSG

DATA

2

256KB

2017/09/06 23:59:19

.

ADSMSG

INDEX

 

37KB

2017/09/06 23:59:19

5

AFCLASS

CATALOG

2

2MB

2017/09/07 00:02:06

6

AFMSG

DATA

2

448KB

2017/09/06 23:56:10

.

AFMSG

INDEX

 

73KB

2017/09/06 23:56:10

7

AFTOOLS

CATALOG

2

2MB

2017/09/07 00:02:06

8

AIR

DATA

11

128KB

2017/09/07 00:01:10

9

APPLIANC

DATA

11

128KB

2017/09/07 00:01:10

10

ASSCMGR

DATA

2

320KB

2017/09/07 00:08:56

11

ASSIST

CATALOG

5

9MB

2013/06/20 01:51:39

12

AUTHLIB

DATA

16

192KB

2017/09/07 00:35:07

13

BASE

CATALOG

2

681KB

2017/09/06 23:57:23

14

BASEBALL

DATA

2

192KB

2017/11/09 01:22:58

15

BEI

DATA

2

5MB

2017/11/09 01:22:59

16

BFC

CATALOG

9

513KB

2017/09/06 23:58:55

Now, to view the descriptor information for any of the files that appear in the sashelp library, we need to modify our CONTENTS procedure a little bit.

Example 3.2

The following SAS code tells SAS to display the descriptor information for the class data set that appears in the sashelp library:

PROC CONTENTS data = sashelp.class;
RUN;

Launch and run  the SAS code. (If, by chance, the class data set is not stored in your sashelp library, modify the code so that it references a data set that is stored in the library.) Review the resulting output to convince yourself that SAS does indeed display the descriptor information. Note in particular that, by default, SAS displays the variable information in alphabetical order:

Alphabetic List of Variables and Attributes

#

Variable

Type

Len

3

Age

Num

8

4

Height

Num

8

1

name

Char

8

2

Sex

Char

1

5

Weight

Num

8

If we instead preferred to view a listing of the variables in the order in which they appear in the data set, we again need to modify the CONTENTS procedure a bit.

Example 3.3

The following SAS code again tells SAS to display the descriptor information for the class data set that appears in the sashelp library. The varnum option requests that SAS display the listing of the variables in the order in which they appear in the data set:

PROC CONTENTS data = sashelp.class varnum;
RUN;

Launch and run  the SAS code. (Again, if the deskact data set is not stored in your sasuser library, modify the code so that it references a data set that is stored in the library.) Review the resulting output to convince yourself that SAS displays the descriptor information, and in particular, the variable information in the order in which the variables appear in the data set:

Variables in Creation Order

#

Variable

Type

Len

1

name

Char

8

2

Sex

Char

1

3

Age

Num

8

4

Height

Num

8

5

Weight

Num

8


3.2 - List Input

3.2 - List Input

Okay, now let's spend some time on a subject that is a little more fitting for the title of this lesson. In the previous lesson, we investigated how to use column input to read data values that appear in neatly defined columns looking like this, say:

Smith        8145551354  3.89
Washington   8145569847  2.73
Wing         8145359376  3.56
Jackson      8145557437  3.12

Here, we'll investigate how to use list input to read in free-format data looking like this, say:

Smith 8145551354 3.89
Washington 8145569847 2.73
Wing 8145359376 3.56
Jackson 8145557437 3.12

List input might be the easiest input style to use because, as shown in the examples that follow, you simply list the variable names in the same order as the corresponding raw data fields. The ease comes with a bit of a price, however. Because you do not have to tell SAS the columns in which the data values appear, you must take note of the following restrictions:

  • Fields must be separated by at least one blank (or other delimiters).
  • Fields must be read in order from left to right.
  • You cannot skip or re-read fields.
  • Missing values must be represented by a placeholder such as a period. (A blank field causes the matching of variable names and values to get out of sync.)
  • Character values can't contain embedded blanks.
  • The default length of character values is 8 bytes. A longer value is truncated when it is written to the data set NOTE! 1 byte = 1 character
  • Data must be in standard character or numeric format.

Example 3.4

The following SAS program illustrates the simplest example of list input. Note that there is one blank space between each of the data values. Also note that although the data values need not be lined up in columns, we still recommend doing so because of the difficulty otherwise in "eyeing" the data quickly.

DATA temp;
  input subj name $ gender height weight;
  * The $ that follows name tells SAS that it is
       a character variable;
  * By default, name only allows up to 8 characters
       to be read in;
  CARDS;
  1024 Alice 1 65 125
  1167 Maryann 1 68 140
  1168 Thomas 2 68 190
  1201 Benny 2 72 190
  1302 Felicia 1 63 115
  ;
RUN;
PROC PRINT data=temp;
  title 'Output dataset: TEMP';
RUN;

Output dataset: TEMP

Obs

subj

name

gender

height

weight

1

1024

Alice

1

65

125

2

1167

Maryann

1

68

140

3

1168

Thomas

2

68

190

4

1201

Benny

2

72

190

5

1302

Felicia

1

63

115

The INPUT statement is how you tell SAS to read in the data using list input. For list input, simply list the variable names — leaving at least one space between names — in the order in which the variables appear in the data file. Remember to use the dollar sign ($) to distinguish character variables from numeric variables.

Launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are read in properly.

Example 3.5

The following SAS program illustrates the necessary use of the missing value (.) placeholder when a data value is missing:

DATA temp;
  input subj name $ gender height weight;
  CARDS;
  1024 Alice 1 65 125
  1167 Maryann 1 68 140
  1168 Thomas 2 68 190
  1201 Benny 2 . 190
  1302 Felicia 1 63 115
  ;
RUN;
PROC PRINT data=temp;
  title 'Output dataset: TEMP';
RUN;

Output dataset: TEMP

Obs

subj

name

gender

height

weight

1

1024

Alice

1

65

125

2

1167

Maryann

1

68

140

3

1168

Thomas

2

68

190

4

1201

Benny

2

.

190

5

1302

Felicia

1

63

115

Note that Benny's height is missing. Therefore, since we are using the list input style to read in the data, we have to put in a missing value (.) placeholder.

First, launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are read in properly. Then, edit the program by deleting the missing value (.) placeholder. Rerun  the SAS program to see what happens when you fail to account for the missing value. In the log file, you should see a note that says:

NOTE: SAS went to a new line when INPUT statement reached past the end of a line

And the resulting output:

Output dataset: TEMP

Obs

subj

name

gender

height

weight

1

1024

Alice

1

65

125

2

1167

Maryann

1

68

140

3

1168

Thomas

2

68

190

4

1201

Benny

2

190

1302

should indicate that something has clearly gone awry. What is going on here is that, by default, SAS goes to the next data line to find more data if there are more variable names in the INPUT statement then there are values in the data line. In this case, Benny's height becomes 190, the first number to appear in the data line after gender, and Benny's weight becomes 1302, the first number to appear in the next data line.

Example 3.6

The following SAS program illustrates how a character variable is, by default, truncated if it contains more than 8 characters. The name 'Benedictine' is saved in the variable name as 'Benedict'.

DATA temp;
  input subj name $ gender height weight;
  CARDS;
  1024 Alice 1 65 125
  1167 Maryann 1 68 140
  1168 Thomas 2 68 190
  1201 Benedictine 2 68 190
  1302 Felicia 1 63 115
  ;
RUN;
PROC PRINT data=temp;
  title 'Output dataset: TEMP';
RUN;

Output dataset: TEMP

Obs

subj

name

gender

height

weight

1

1024

Alice

1

65

125

2

1167

Maryann

1

68

140

3

1168

Thomas

2

68

190

4

1201

Benny

2

72

190

5

1302

Felicia

1

63

115

Launch and run the SAS program. Review the output from the print procedure to convince yourself that the name 'Benedictine' is indeed truncated to 'Benedict'. Incidentally, it is possible to use a LENGTH statement to tell SAS to allow the character variable name to contain more than eight characters. We'll learn about the LENGTH statement later.

Example 3.7

The following SAS program illustrates how you can use the DELIMITER option of the INFILE statement to use values separators other than blanks. This example, in particular, illustrates it for the commonly used comma (,) as a delimiter:

DATA temp;
  infile cards delimiter=',';
  input subj name $ gender height weight;
  CARDS;
  1024,Alice,1,65,125
  1167,Maryann,1,68,140
  1168,Thomas,2,68,190
  1201,Benny,2,.,190
  1302,Felicia,1,63,115
  ;
RUN;
PROC PRINT data=temp;
  title 'Output dataset: TEMP';
RUN;

Output dataset: TEMP

Obs

subj

name

gender

height

weight

1

1024

Alice

1

65

125

2

1167

Maryann

1

68

140

3

1168

Thomas

2

68

190

4

1201

Benny

2

.

190

5

1302

Felicia

1

63

115

By default, SAS assumes data are space-delimited. The DELIMITER option of the INFILE statement here instead warns SAS that the data are comma-delimited — that is, that commas separate the data values rather than blank spaces. You might also have noted that although the INFILE statement typically directs SAS to externally stored data, here the CARDS option included in the INFILE statement alerts SAS that the data are actually included in the code. Launch and run the SAS program. Review the output from the print procedure to convince yourself that the data are indeed read in properly.

Now, let's go investigate another style of input, namely formatted input.


3.3 - Formatted Input

3.3 - Formatted Input

The fundamental difference between column input, which we studied in the previous lesson, and formatted input, which we'll explore now, is that column input is only appropriate for reading standard numeric data, while formatted input allows us to read both standard and nonstandard numeric data. That is, formatted input combines the features of column input with the ability to read nonstandard data values.

Standard numeric data values

Recall that standard numeric data values can contain only:

  • numbers
  • decimal points
  • numbers using scientific (E) notation
  • negative (minus) and positive (plus) signs

Examples of standard numeric values include 26, 3.9, -13, +3.14, 314E-2, and 2.193E3.

Nonstandard numeric data values

On the other hand, nonstandard numeric data values include:

  • values that contain special characters, such as dollar signs ($), percent signs (%), and commas (,)
  • date and time values
  • data in fraction, integer binary, real binary, and hexadecimal forms

Examples of nonstandard numeric values include: 23.3%, $1.26, and 03/07/47.

The INPUT Statement

Here's the general form of the INPUT statement when using formatted input:

INPUT <pointer-control> variable informat.;

where:

  • pointer-control tells SAS at what column to start reading the data value
  • variable is the name of the variable being created
  • informat is a special instruction that tells SAS how to read the raw data values

A couple of things here. The above INPUT statement is written using standard SAS Help notation. For example, the pointer control appears in brackets (<>) only to indicate that it is optional, or rather, not necessary for every variable you create. For example, you need not tell SAS to go to column 1 if that's where you want to start reading data values, because that's where SAS starts by default. In practice, the brackets (<>) should not appear in the INPUT statements in your program ... otherwise, SAS will hiccup.

There are two pointer controls that we'll learn about here:

  • The @n pointer control moves the input pointer to a specific column number n
  • The +n pointer control moves the input pointer forward n columns to a column number that is relative to the current position

Again, an informat is what is used to tell SAS what special instructions are required to read in the raw data values. Many (too many?) special informats are available in SAS. For example, the numeric informat "mmddyy8." tells SAS that you want to read in a date that takes up to 8 spaces and looks like 10/27/05. The numeric informat "comma6." tells SAS that you want to read in a number that contains a comma and takes up to 6 spaces (including the comma), such as the number 11,387.

Let's take a look at an example!

Example 3.8

The following SAS program uses the @n column pointer control and standard numeric informats to read three numeric variables — subj, height, and weight — into a temporary SAS data set called temp:

DATA temp;
  input @1  subj 4. 
        @27 height 2. 
        @30 weight 3.;
  DATALINES;
1024 Alice       Smith  1 65 125 12/1/95  2,036
1167 Maryann     White  1 68 140 12/01/95 1,800
1168 Thomas      Jones  2    190 12/2/95  2,302
1201 Benedictine Arnold 2 68 190 11/30/95 2,432
1302 Felicia     Ho     1 63 115 1/1/96   1,972
  ;
RUN;
PROC PRINT data = temp;
  title 'Output dataset: TEMP';
RUN;

Output dataset: TEMP

Obs

subj

height

weight

1

1024

65

125

2

1167

68

140

3

1168

.

190

4

1201

68

190

5

1302

63

115

If you look at the INPUT statement, you'll see that it uses @n absolute pointer controls to move the input pointer first to column 1 to read subj, then to column 27 to read height, and finally to column 30 to read weight. In general, the @ moves the pointer to column n, the first column of the field that is being read.

The 4. that appears after subj, the 2. that appears after height, and 3. that appears after weight are the informats that tells SAS how to read each of the raw data values. In each case here, we are trying to read in standard numeric data values, and so we use what is called the w.d informat. The w tells SAS the width of the raw data value, that is how many columns it occupies. The d, which is optional, tells SAS the number of implied decimal places for the raw data value. The w and d must be connected by a period (.) delimiter. If the d is not present, you still need to make sure that you include the period in the informat name.

Making this all a little more concrete ... here, the subj values are four columns wide with no decimal places, and hence we use the 4. informat. We alternatively could have specified a 4.0 informat, but we could not have specified 4 (without the period) as the informat. The height values are two columns wide with no decimal places, and hence we use the 2. informat. Finally, the weight values are three columns wide with no decimal places, and hence we use the 3. informat.

Incidentally, the w.d informat ignores any d that we specify if the data value already contains a decimal point. So, for example, if we had a raw data value of 23.001 (occupying 6 columns with 5 digits, 1 decimal point, and 3 decimal places) and specified a 6. informat, SAS would still store the value as 23.001, even though we told SAS not to expect any decimal places.

Okay ... launch and run the SAS program, and review the output obtained from the print procedure to convince yourself that the three variables were read incorrectly. Oh, do you remember when I said that you needn't tell SAS to move the input pointer to the first column because it does so by default? You might want to convince yourself of this by removing the @1 that appears in the INPUT statement. Then, rerun  the program to convince yourself that SAS still reads the data values properly.

Example 3.9

The following SAS program uses the @n column pointer control and standard character and numeric informats to read, respectively, two character variables — l_name and f_name — and two numeric variables — weight and height — into a temporabackwardry SAS data set called temp:

DATA temp;
  input @18 l_name $6.
        @6  f_name $11.
		@30 weight 3.
		@27 height 2.;
  DATALINES;
1024 Alice       Smith  1 65 125 12/1/95  2,036
1167 Maryann     White  1 68 140 12/01/95 1,800
1168 Thomas      Jones  2    190 12/2/95  2,302
1201 Benedictine Arnold 2 68 190 11/30/95 2,432
1302 Felicia     Ho     1 63 115 1/1/96   1,972
  ;
RUN;
PROC PRINT data = temp;
  title 'Output dataset: TEMP';
RUN;

Output dataset: TEMP

Obs

l_name

f_name

weight

height

1

Smith

Alice

125

65

2

White

Maryann

140

68

3

Jones

Thomas

190

.

4

Arnold

Benedictine

190

68

5

Ho

Felicia

115

63

The INPUT statement uses @n absolute pointer controls to move the input pointer first to column 18 to read l_name, then back to column 6 to read f_name, forward to column 30 to read weight, and back again to column 27 to read height. This example illustrates, therefore, how you can use @n pointer controls to read data fields in any order ... backwards, forward, and backward again.

The 3. that appears after weight and the 2. that appears after height should look familiar. They are again the numeric informats that tells SAS how to read each of the two numeric variables of interest. Now, if you look at the informats for the two character variables, l_name, and f_name, you should see that they each begin with a dollar sign ($). Because we are trying to read in character data values, we use what is called the $w. informat. The w tells SAS the width of the raw data value, that is how many columns it occupies. The dollar sign ($) and period (.) are required delimiters. In our example, the last name (l_name) occupies as many as 6 columns, hence, the $6. informat, and the first name (f_name) occupies as many as 11 columns hence we use the $11. informat.

Okay ... launch and run the SAS program, and review the output obtained from the print procedure to convince yourself that the four variables were read in correctly.

Example 3.10

In addition to using @n absolute pointer controls with numeric and character informats, the following SAS program uses +n relative pointer controls with nonstandard informats to create a temporary SAS data set called temp containing six variables:

DATA temp;
  input @1  subj 4.
        @6  f_name $11.
		@18 l_name $6.
		+3 height 2.
        +5 wt_date mmddyy8.
        +1 calorie comma5.;
  DATALINES;
1024 Alice       Smith  1 65 125 12/1/95  2,036
1167 Maryann     White  1 68 140 12/01/95 1,800
1168 Thomas      Jones  2    190 12/2/95  2,302
1201 Benedictine Arnold 2 68 190 11/30/95 2,432
1302 Felicia     Ho     1 63 115 1/1/96   1,972
  ;
RUN;
PROC PRINT data = temp;
  title 'Output dataset: TEMP';
RUN;

Output dataset: TEMP

Obs

subj

f_name

l_name

height

wt_date

calories

1

1024

Alice

Smith

65

13118

2036

2

1167

Maryann

White

68

13118

1800

3

1168

Thomas

Jones

.

13119

2302

4

1201

Benedictine

Arnold

68

13117

2432

5

1302

Felicia

Ho

63

13149

1972

You should now understand the specifications for reading subj, f_name and l_name. The INPUT statement tells SAS to go to column 1 to read the numeric field subj that is four columns wide, then to go to column 6 to read the character field f_name that is 11 columns wide, and then to go to column 18 to read the character field l_name that is 6 columns wide.

The +n relative pointer controls are used to read the remaining three variables — height, wt_date, and calorie. The +n pointer control moves the input pointer forward to a column number that is relative to the current position of the pointer. That is, the + moves the pointer forward n columns. In order to count correctly, it is important to understand where the column pointer is located after each data value is read. In general, when using formatted input, the column pointer moves to the first column that follows the field that was just read.

Again, let's make it more concrete. When SAS finishes reading the l_name field, the column pointer moves to the first column that follows that field, that is, to the column that immediately follows the d in Arnold (column 24). Now, the height field begins 3 columns to the right, so that's why we tell SAS to move +3 before reading the height data. When SAS finishes reading the height field, the column pointer moves to the first column that follows that field, that is, to the empty column that follows the heights (column 29). Now, the wt_date field begins 5 columns to the right, so that's why we tell SAS to move +5 before reading the weight dates. When SAS finished reading the wt_date field, the column pointer moves to the first column that follows the field, that is to the empty column just before the calorie data (column 42). Now, the calorie field begins 1 column to the right, so that's why we tell SAS to move +1 before reading the calorie data.

We'd be all done explaining the INPUT statement if it weren't for the wt_date and calorie fields containing nonstandard numeric data. The wt_date field contains forward slashes (/) for specifying the dates, and the calorie data contains commas (,). To read the wt_date field, we use the mmddyy8. informat to tell SAS to expect dates written in mm/dd/yy form. The 8, of course, tells SAS that the dates could occupy as many as eight columns. To read the calorie field, we use the comma6. informat to tell SAS to read numeric data containing commas and occupying as many as 6 columns.

In general, the COMMAw.d informat is used to read numeric values and to remove embedded blanks, commas, dashes, dollar signs, percent signs, right parentheses, and left parentheses (which are converted to minus signs). The w. tells SAS the width of the field (including dollar signs, decimal places, or other special characters). The d, which is optional, tells SAS the number of implied decimal places for a value.

Whew!!! Now, go ahead and launch and run the SAS program, and review the output obtained from the print procedure. In doing so, note that formatted input does not imply formatted output. For example, note that SAS stores 12/01/95 as the number 13118. We won't worry about this too much yet — it's just that, interestingly enough, SAS stores dates as numeric values equal to the number of days that have passed since January 1, 1960. Yup, that's right —dates after January 1, 1960, are stored as unique positive integers, and dates before January 1, 1960, are stored as unique negative integers. Also, note that SAS prints the values of the variable calorie without commas. In order to get SAS to print the values in a more understandable format, we must tell SAS how to format the output by using a SAS FORMAT statement.

Type in the following statement exactly as it appears:

FORMAT wt_date mmddyy8. calorie comma5.;

between the INPUT statement and the DATALINES statement in your program. Rerun  the SAS program with the now included FORMAT statement, and review the output from the print procedure to convince yourself that the data are now printed as desired. You should now see that it looks something like this:

Output dataset: TEMP

Obs

subj

f_name

l_name

height

wt_date

calories

1

1024

Alice

Smith

65

12/01/95

2,036

2

1167

Maryann

White

68

12/01/95

1,800

3

1168

Thomas

Jones

.

12/02/95

2,302

4

1201

Benedictine

Arnold

68

11/30/95

2,432

5

1302

Felicia

Ho

63

01/01/96

1,972


3.4 - SAS Help and Documentation

3.4 - SAS Help and Documentation

Here's as good an opportunity as any to practice using the SAS Help feature that comes with SAS. Be forewarned that any information you could possibly need is in SAS Help — it's just that sometimes it takes some patience to find it.

Note that SAS has changed the interface and access of help files dramatically since 9.2. The contents shown below apply to SAS 9.3 and above.

In the previous section, we used two informats for reading nonstandard numeric data, namely MMDDYY8. and COMMA6. There are oodles of other informats available making it impossible to address them all in this course. To name just a few, there's PERCENTw.d, DATEw., TIMEw., and JULIANw. There's no doubt that at some point in your future SAS programming, you'll need to look up an informat in the SAS Help and Documentation. So, let's give you some practice with doing that now. Let's look for a list of the available SAS informats. To do so:

  1. On the menu toolbar, under Help, select SAS Help and Documentation:

SAS Help button

  1. A new window pops out:

SAS Help opened

  1. In the box labeled Quick Search, type the word informats:

SAS Help opened with informats search term

  1. Then click the arrow button next to it to start searching…
  2. Finally, under the relevant list that SAS displays, use your mouse to select item 36 labeled Informats by Category:

SAS Help informats selection of item 36

  1. In the righthand portion of the window, you should see that the relevant content appears. It is titled Informats by Category. Page down and take a look at the (long) list of available SAS informats. For example, you should see one in the Character category called $CHARw. which SAS tells you that it is the informat you can use if you want to read in characters with blanks. You should also see one in the Character category called $w. which SAS tells you that it is the informat to use if you want to read in standard character data. Finally, you should see one in the Numeric category called w.d which SAS tells you that it is the informat to use if you want to read in standard numeric data.

    I'll be the first person to agree with you that some of the SAS Help can be intimidating and not particularly clear on the first pass. On the other hand, it has been my experience that if you are patient with it, you can usually find a good example or two to help clarify sticking points. It is in your best interest to get comfortable with SAS Help, as no course (not even this one!) is going to teach you everything you need to know about SAS.


3.5 - SAS System Options

3.5 - SAS System Options

Here's a topic that can really be brought up anywhere and therefore doesn't necessarily belong in this lesson. On the other hand, it's a subject that we don't want to wait too long to introduce, because of its inherent benefits. So, let's go ahead and do it now before any more time passes by.

SAS System options are parameters that you set that affect how SAS works, how SAS formats your output, how SAS uses memory, how SAS handles errors, and so on. All SAS System options have default settings that are used unless you specify otherwise. For example, page numbers are automatically displayed on your output. To modify system options, we just need to submit an OPTIONS statement. Here's one example of an OPTIONS statement that you might want to include at the beginning of each of your SAS programs:

Option PS = 60 LS = 80 NODATE;

The above statement tells SAS that you want:

  • each page of output to contain no more than 60 lines (PS = 60, where PS stands for "pagesize")
  • each line of output to contain no more than 80 characters (LS = 80, where LS stands for "linesize")
  • each page of output to contain no date and time at which the output was created

You can place an OPTIONS statement anywhere in a SAS program to change the settings from that point onward. However, it is good programming practice to place OPTIONS statements outside of DATA or PROC steps so that your programs are easier to read and debug.

In general, the common options you might consider specifying are:

  • CENTER or NOCENTER. CENTER, which is the default, centers the text in your output.
  • DATE or NODATE. DATE, which is the default, prints the current date and time on your output, while NODATE doesn't.
  • NUMBER or NONUMBER. NUMBER, which is the default, prints an accumulative page number on each page of your output.
  • LINESIZE = n, where n can be any number between 64 and 256 (inclusive), specifies the width of the print line for your procedure output and log. Observations that do not fit within the line size continue on a different line or page.
  • PAGESIZE = n, where n can be any number between 15 and 32767 (inclusive), specifies how many lines each page of output contains.
  • PAGENO = n, where n is the page number at which you want SAS to start numbering your output pages. If you don't specify the PAGENO= option, your output is numbered sequentially throughout your SAS session, starting with page 1.

There are just a couple of things to keep in mind about the OPTIONS statement. First, it is a global statement, meaning it affects every DATA step and PROC step that follows it. Second, it overrides any previous OPTIONS statements. In this sense, you can think of the options you specify as toggle switches. The options stay in place until you change them with a new OPTIONS statement or you end your SAS session. We certainly encourage you to use OPTIONS statements throughout this course (and beyond!) to make your output more user-friendly.


3.6 - Summary

3.6 - Summary

Although this lesson has the (deceiving?) title about reading data into a SAS data set, we covered that and much more. In terms of reading data into a SAS data set, we investigated how the INPUT statement works with list input and how it works with formatted input. Then, we addressed other miscellaneous, but important, issues such as using the CONTENTS procedure to view the contents of your data sets, getting help from SAS Help and Documentation, and setting SAS system options.

Now, it is time to get some practice using these techniques.  See the homework assignment for this lesson.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility