1.4  Example: Descriptive Statistics
1.4  Example: Descriptive StatisticsExample 15: Women's Health Survey (Descriptive Statistics)
Let us take a look at an example. In 1985, the USDA commissioned a study of women’s nutrition. Nutrient intake was measured for a random sample of 737 women aged 2550 years. The following variables were measured:
 Calcium(mg)
 Iron(mg)
 Protein(g)
 Vitamin A(μg)
 Vitamin C(mg)
Using Technology
We will use the SAS program to carry out the calculations that we would like to see.
Download the data file: nutrient.csv
The lines of this program are saved in a simple text file with a .sas file extension. If you have SAS installed on the machine on which you have downloaded this file, it should launch SAS and open the program within the SAS application. Marking up a printout of the SAS program is also a good strategy for learning how this program is put together.
Note: In the upper righthand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
options ls=78; /*This sets the max number of lines per page to 78.*/
title "Example: Nutrient Intake Data  Descriptive Statistics";
/*This sets a title that will appear on each page of the output until it's changed.*/
data nutrient; /*This defines a data set called 'nutrient'.*/
infile "D:\Statistics\STAT 505\data\nutrient.csv" firstobs=2 delimiter=','; /*SAS will look in this path for the nutrient.csv file.*/
input id calcium iron protein a c; /*This is where we provide names for the variables in order of the columns in the data set. If any were categorical (not the case here), we would need to put a '$' character after its name.*/
run;
proc means;
var calcium iron protein a c; /*If not all variables are of interest, we can specify here the ones we want to work with.*/
run;
proc corr pearson cov; /*The 'pearson' option specifies the pearson correlation to be computed. The 'cov' option requests the sample covariance matrix.*/
var calcium iron protein a c; /*If not all variables are of interest, we can specify here the ones we want to work with.*/
run;
The first part of this SAS output, (download below), is the results of the Means Procedure  proc means. Because the SAS output is usually a relatively long document, printing these pages of output out and marking them with notes is highly recommended if not required!
The MEANS Procedure
The Means Procedure
Summary statistics
Variable  N  Mean  Std Dev  Minimum  Maximum 

calcium
iron
protein
a
c

737
737
737
737
737

624.0492537
11.1298996
65.8034410
839.6353460
78.9284464

397.2775401
5.9841905
30.5757564
1633.54
73.5952721

7.4400000
0
0
0
0

2866.44
58.6680000
251.0120000
34434.27
433.3390000

Download the SAS Output file: nutrient2.lst
The first column of the Means Procedure table above gives the variable name. The second column reports the sample size. This is then followed by the sample means (third column) and the sample standard deviations (fourth column) for each variable. I have copied these values into the table below. I have also rounded these numbers a bit to make them easier to use for this example.
Here are the steps to find the descriptive statistics for the Women's Nutrition dataset in Minitab:
Descriptive Statistics in Minitab
 Go to File > Open > Worksheet [open nutrient_tf.csv]
 Stat > Basic Statistics > Display Descriptive Statistics
 Highlight and select C2 through C6 and choose ‘Select’ to move the variables into the window on the right.
 Select ‘Statistics...’, and check the boxes for the statistics of interest.
 OK > OK
Analysis
Descriptive Statistics
A summary of the descriptive statistics is given here for ease of reference.
Variable  Mean  Standard Deviation 

Calcium  624.0 mg  397.3 mg 
Iron  11.1 mg  6.0 mg 
Protein  65.8 mg  30.6 mg 
Vitamin A  839.6 μg  1634.0 μg 
Vitamin C  78.9 mg  73.6 mg 
Notice that the standard deviations are large relative to their respective means, especially for Vitamin A & C. This would indicate a high variability among women in nutrient intake. However, whether the standard deviations are relatively large or not, will depend on the context of the application. Skill in interpreting the statistical analysis depends very much on the researcher's subject matter knowledge.
The variancecovariance matrix is also copied into the matrix below.
\[S = \left(\begin{array}{RRRRR}157829.4 & 940.1 & 6075.8 & 102411.1 & 6701.6 \\ 940.1 & 35.8 & 114.1 & 2383.2 & 137.7 \\ 6075.8 & 114.1 & 934.9 & 7330.1 & 477.2 \\ 102411.1 & 2383.2 & 7330.1 & 2668452.4 & 22063.3 \\ 6701.6 & 137.7 & 477.2 & 22063.3 & 5416.3 \end{array}\right)\]
Interpretation
Because this covariance is positive, we see that calcium intake tends to increase with increasing iron intake. The strength of this positive association can only be judged by comparing s_{12} to the product of the sample standard deviations for calcium and iron. This comparison is most readily accomplished by looking at the sample correlation between the two variables.
 The sample variances are given by the diagonal elements of S. For example, the variance of iron intake is \(s_{2}^{2}\). 35. 8 mg^{2}.
 The covariances are given by the offdiagonal elements of S. For example, the covariance between calcium and iron intake is \(s_{12}\)= 940. 1.
 Note that, the covariances are all positive, indicating that the daily intake of each nutrient increases with increased intake of the remaining nutrients.
Sample Correlations
The sample correlations are included in the table below.
Calcium  Iron  Protein  Vit. A  Vit. C  

Calcium  1.000  0.395  0.500  0.158  0.229 
Iron  0.395  1.000  0.623  0.244  0.313 
Protein  0.500  0.623  1.000  0.147  0.212 
Vit. A  0.158  0.244<  0.147  1.000  0.184 
Vit. C  0.229  0.313  0.212  0.184  1.000 
Here we can see that the correlation between each of the variables and themselves is all equal to one, and the offdiagonal elements give the correlation between each of the pairs of variables.
Generally, we look for the strongest correlations first. The results above suggest that protein, iron, and calcium are all positively associated. Each of these three nutrient increases with increasing values of the remaining two.
The coefficient of determination is another measure of association and is simply equal to the square of the correlation. For example, in this case, the coefficient of determination between protein and iron is \((0.623)^2\) or about 0.388.
\[r^2_{23} = 0.62337^2 = 0.38859\]
This says that about 39% of the variation in iron intake is explained by protein intake. Or, conversely, 39% of the protein intake is explained by the variation in the iron intake. Both interpretations are equivalent.