Lesson 11: Summarizing Continuous Data

Overview Section

As you now know, in addition to creating pretty reports, the REPORT procedure can be used to calculate some basic descriptive statistics. There are a number of procedures that are available in SAS, however, that are designed specifically to produce a variety of different descriptive statistics and to display them in meaningful reports. The four procedures in particular, of which I am thinking, are the MEANS, SUMMARY, UNIVARIATE, and FREQ procedures.

The FREQ procedure is used to summarize discrete data values and therefore can be used to calculate summary statistics such as the percentage of people with blue eyes and the number of elm trees succumbing to Dutch elm disease. We'll learn about the FREQ procedure in the next (and final!) lesson.

The MEANS, SUMMARY, and UNIVARIATE procedures are used to summarize continuous numeric values and therefore can be used to calculate statistics, such as mean height, median salary, and minimum mileage. We'll learn about these three procedures in this (the next to final!) lesson.

We'll work mostly with the MEANS procedure. Then, since the SUMMARY and UNIVARIATE procedures have similar options and statements as the MEANS procedure, we'll spend less time on them. The greatest difference between the three procedures is that the UNIVARIATE procedure calculates a few more additional statistics not available in the MEANS and SUMMARY procedures. If you do not need to calculate the additional statistics that are available in UNIVARIATE, however, it is much more efficient to use the MEANS and SUMMARY procedures.

All three of the procedures take the following generic form:

PROC PROCNAME options;
         statement1;
         statement2;
         etc;
     RUN;

where, not surprisingly, PROCNAME stands for the name of the procedure, and is therefore — either MEANS, SUMMARY, or UNIVARIATE.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to use the three procedures that are available in SAS — MEANS, SUMMARY, and UNIVARIATE — to perform various basic descriptive statistics on the numeric variables in a data set, including:

  • inform SAS which numeric variables to analyze using the VAR statement
  • identify which SAS summary statistics to calculate using the various statistic keywords
  • tell SAS how to format the report containing the summary statistics using the MAXDEC= and FW= options
  • suppress printing of the default report using the NOPRINT option of the MEANS procedure
  • generate a report containing the summary statistics using the PRINT option of the SUMMARY procedure 
  • perform a separate analysis for each BY group created by the variables appearing in the BY statement
  • run the CLASS statement in the MEANS and SUMMARY procedures to make SAS form subgroups before calculating summary statistics
  • create a data set containing summary statistics rather than the standard printed output using the OUTPUT statement 
  • create a quick-and-dirty interaction plot with the MEANS and PLOT procedures 
  • run the NORMAL option of the UNIVARIATE procedure to compute four different "test for normality" statistics
  • run the PLOT option of the UNIVARIATE procedure to create a histogram, boxplot, and a normal probability plot
  • run the ID statement in the UNIVARIATE procedure to use the values of the variable indicated in the ID statement to indicate the five largest and five smallest observations