Lesson 11: Summarizing Continuous Data

Overview Section

As you now know, in addition to creating pretty reports, the REPORT procedure can be used to calculate some basic descriptive statistics. There are a number of procedures that are available in SAS, however, that are designed specifically to produce a variety of different descriptive statistics and to display them in meaningful reports. The four procedures in particular, of which I am thinking, are the MEANS, SUMMARY, UNIVARIATE, and FREQ procedures.

The FREQ procedure is used to summarize discrete data values, and therefore can be used to calculate summary statistics such as the percentage of people with blue eyes and the number of elm trees succumbing to Dutch elm disease. We'll learn about the FREQ procedure in the next (and final!) lesson.

The MEANS, SUMMARY, and UNIVARIATE procedures are used to summarize continuous numeric values, and therefore can be used to calculate statistics, such as mean height, median salary, and minimum mileage. We'll learn about these three procedures in this (the next to final!) lesson.

We'll work mostly with the MEANS procedure. Then, since the SUMMARY and UNIVARIATE procedures have similar options and statements as the MEANS procedure, we'll spend less time on them. The greatest difference between the three procedures is that the UNIVARIATE procedure calculates a few more additional statistics not available in the MEANS and SUMMARY procedures. If you do not need to calculate the additional statistics that are available in UNIVARIATE, however, it is much more efficient to use the MEANS and SUMMARY procedures.

All three of the procedures take the following generic form:

PROC PROCNAME options;
         statement1;
         statement2;
         etc;
     RUN;

where, not surprisingly, PROCNAME stands for the name of the procedure, and is therefore — either MEANS, SUMMARY, or UNIVARIATE.

Objectives

Upon completion of this lesson, you should be able to:

Upon completing this lesson, you should be able to use the three procedures that are available in SAS — MEANS, and SUMMARY, and UNIVARIATE — to perform various basic descriptive statistics on the numeric variables in a data set, including:

  • use the VAR statement to tell SAS which numeric variables to analyze
  • use the various statistic keywords to tell SAS which summary statistics to calcuate
  • use the MAXDEC= and FW= options to tell SAS how to format the report containing the summary statistics
  • use the NOPRINT option of the MEANS procedure to suppress printing of the default report
  • use the PRINT option of the SUMMARY procedure to generate a report containing the summary statistics
  • use the BY statement to tell SAS to perform a separate analysis for each BY-group created by the variables appearing in the BY statement
  • use the CLASS statement in the MEANS and SUMMARY procedures to tell SAS to form subgroups before calculating summary statistics
  • use the OUTPUT statement to create a data set containing summary statistics rather than the standard printed output
  • use the MEANS and PLOT procedures to create a quick-and-dirty interaction plot
  • use the NORMAL option of the UNIVARIATE procedure to tell SAS to compute four different "test for normality" statistics
  • use the PLOT option of the UNIVARIATE procedure to tell SAS to create a histogram, boxplot, and a normal probability plot
  • use the ID statement in the UNIVARIATE procedure to tell SAS to use the values of the variable indicated in the ID statement to indicate the five largest and five smallest observations