11.1 - The MEANS and SUMMARY Procedures

In this section, we'll learn the syntax of the simplest MEANS and SUMMARY procedures, as well as familiarize ourselves with the output they generate.

Example 11.1 Section

Throughout our investigation of the MEANS, SUMMARY, and UNIVARIATE procedures, we'll use the hematology data set arising from the ICDB Study. The following program tells SAS to display the contents, and print the first 15 observations, of the data set:

OPTIONS PS = 58 LS = 80 NODATE NONUMBER;
LIBNAME icdb 'C:\Yourdrivename\Stat480WC\sasndata';
PROC CONTENTS data = icdb.hem2 position;
RUN;
PROC PRINT data = icdb.hem2 (OBS = 15);
RUN;

Hematology data set

Obs

subj

hosp

wbc

rbc

hemog

hcrit

mcv

mch

mchc

1

110027

11

7.5

4.38

13.8

40.9

93.3

31.5

33.7

2

11027

11

7.6

5.20

15.2

45.8

88.0

29.2

33.1

3

110039

11

7.5

4.33

13.1

39.4

91.0

30.2

33.2

4

110040

11

8.3

4.52

12.4

38.1

84.2

27.4

32.5

5

110045

11

8.9

4.72

14.6

42.7

90.4

90.9

34.1

6

110049

11

6.2

4.71

13.8

41.7

88.5

29.2

33.0

7

110051

11

6.4

4.56

13.0

37.9

83.1

28.5

34.3

8

110052

11

7.1

3.69

12.5

35.6

97.2

33.8

33.8

9

110053

11

7.4

4.47

14.4

43.6

97.2

32.2

33.0

10

110055

11

6.1

4.34

12.8

38.2

88.1

29.6

33.6

11

110057

11

9.5

4.70

13.4

40.5

86.0

28.4

33.0

12

110058

11

6.5

3.76

11.6

34.2

91.0

30.7

33.8

13

110059

11

7.5

4.29

12.3

36.8

85.7

28.6

33.4

14

110060

11

7.6

4.57

13.8

42.0

91.8

30.1

32.8

15

110062

11

4.6

4.87

13.9

42.9

88.2

28.5

32.3

First, click the link to save the hematology data set to a convenient location on your computer. Then, launch the SAS program, and edit the LIBNAME statement so that it reflects the location in which you saved the data set. Finally, run  the program. You may recall that the CONTENTS procedure's POSITION option tells SAS to display the contents of the data set in the order in which the variables appear in the data set. Therefore, you should see an output that looks something like this:

The CONTENTS Procedure
Variables in Creation Order

#

Variable

Type

Len

1

subj

Num

8

2

hosp

Num

8

3

wbc

Num

8

4

rbc

Num

8

5

hemog

Num

8

6

hcrit

Num

8

7

mcv

Num

8

8

mch

Num

8

9

mchc

Num

8

The first two variables, subj, and hosp, tell us the subject number and at what hospital the subject's data were collected. The remaining variables, wbc, rbc, hemog, ... are the blood data variables of most interest. For example, the variables wbc and rbc contain the subject's white blood cell and red blood cell counts, respectively. The really important thing to note when reviewing the output is that all of the blood data variables are continuous numeric variables, which lend themselves perfectly to a descriptive analysis using the MEANS procedure.

Example 11.2 Section

The MEANS procedure can include many statements and options for specifying the desired statistics. For the sake of simplicity, we'll start out with the most basic form of the MEANS procedure. The following program simply tells SAS to display basic summary statistics for each numeric variable in the icdb.hem2 data set:

PROC MEANS data = icdb.hem2;
RUN;

The MEANS Procedure

Variable

N

Mean

Std Dev

Minimum

Maximum

subj

635

327199.50

144410.20

10027.00

520098.00

hosp

635

32.7133858

14.4426330

11.0000000

52.0000000

wbc

635

7.1276850

1.9019097

3.0000000

14.2000000

rbc

635

4.4350079

0.3941710

3.1200000

5.9500000

hemog

635

13.4696063

1.9019097

3.0000000

14.2000000

hcrit

635

39.4653543

3.1623819

29.7000000

51.4000000

mcv

635

89.1184252

4.5190963

65.0000000

106.0000000

mch

634

30.4537855

1.7232248

22.0000000

37.0000000

mchc

634

34.1524290

0.7562054

31.6000000

36.7000000

Launch and run  the SAS program, and review the output to familiarize yourself with the summary statistics that the MEANS procedure calculates by default. As you can see, in its most basic form, the MEANS procedure prints N (the number of nonmissing values), the mean, the standard deviation, and the minimum and maximum values of every numeric variable in the data set.

In most cases, you probably don't want SAS to calculate summary statistics for every numeric variable in your data set. Instead, you'll probably just want to focus on a few important variables. For our hematology data set, for example, it doesn't make much sense for SAS to calculate summary statistics for the subj and hosp variables. After all, how does it help us to know that the average subj number is 327199.5?

Example 11.3 Section

The following program uses the MEANS procedure's VAR statement to restrict SAS to summarizing just the seven blood data variables in the icdb.hem2 data set:

PROC MEANS data = icdb.hem2;
   var wbc rbc hemog hcrit mcv mch mchc;
RUN;

The MEANS Procedure

Variable

N

Mean

Std Dev

Minimum

Maximum

wbc

635

7.1276850

1.9019097

3.0000000

14.2000000

rbc

635

4.4350079

0.3941710

3.1200000

5.9500000

hemog

635

13.4696063

1.9019097

3.0000000

14.2000000

hcrit

635

39.4653543

3.1623819

29.7000000

51.4000000

mcv

635

89.1184252

4.5190963

65.0000000

106.0000000

mch

634

30.4537855

1.7232248

22.0000000

37.0000000

mchc

634

34.1524290

0.7562054

31.6000000

36.7000000

Launch and run  the SAS program, and review the output to convince yourself that the subj and hosp variables have been excluded from the analysis.

The other thing you might notice about the output is that there are many more decimal places displayed than are necessary. By default, SAS uses the best. format to display values in reports created by the MEANS procedure. In a technical sense, it means that SAS chooses the format that provides the most information about the summary statistics while maintaining a default field width of 12. In a practical sense, it means that often too many decimal places are displayed.

Example 11.4 Section

The following program uses the MEANS procedure's MAXDEC = option to set the maximum number of decimal places displayed to 2, and the FW= option to set the maximum field width printed to 10:

PROC MEANS data = icdb.hem2 MAXDEC = 2 FW = 10;
   var wbc rbc hemog hcrit mcv mch mchc;
RUN;

The MEANS Procedure

Variable

N

Mean

Std Dev

Minimum

Maximum

wbc

635

7.13

1.90

3.00

14.20

rbc

635

4.44

0.39

3.12

5.95

hemog

635

13.47

1.11

9.90

17.70

hcrit

635

39.47

3.16

29.70

51.40

mcv

635

89.12

4.52

65.00

106.00

mch

634

30.45

1.72

22.00

37.00

mchc

634

34.15

0.76

31.60

36.70

Launch and run  the SAS program, and review the output to convince yourself that the maximum number of decimal places and field widths have been modified as claimed. Let's check out the SUMMARY procedure now.

Example 11.5 Section

The following program is identical to the program in the previous example except for two things:

  1. The MEANS keyword has been replaced with the SUMMARY keyword
  2. The PRINT option has been added to the PROC statement:
PROC SUMMARY data = icdb.hem2 MAXDEC = 2 FW = 10 PRINT;
   var wbc rbc hemog hcrit mcv mch mchc;
RUN;

The SUMMARY Procedure

Variable

N

Mean

Std Dev

Minimum

Maximum

wbc

635

7.13

1.90

3.00

14.20

rbc

635

4.44

0.39

3.12

5.95

hemog

635

13.47

1.11

9.90

17.70

hcrit

635

39.47

3.16

29.70

51.40

mcv

635

89.12

4.52

65.00

106.00

mch

634

30.45

1.72

22.00

37.00

mchc

634

34.15

0.76

31.60

36.70

The MEANS and SUMMARY procedures perform the same functions except for the default setting of the PRINT option. By default, the MEANS procedure produces printed output, while the SUMMARY procedure does not. With the MEANS procedure, you have to use the NOPRINT option to suppress printing, while with the SUMMARY procedure, you have to use the PRINT option to get a printed report.

Launch and run  the SAS program, and review the output to convince yourself that there is no difference between the two reports created by the MEANS and SUMMARY procedures.

Wait a second here .... if you're not careful, there is actually a difference. The VAR statement in the above program tells SAS which of the (numeric) variables to summarize. If you do not include a VAR statement in the SUMMARY procedure, SAS merely gives a simple count of the number of observations in the data set. To convince yourself of this, delete the VAR statement, and re-run  the SAS program. You should see an output that looks something like this:

The SUMMARY Procedure

N Obs

635