1.4 - Example: Descriptive Statistics

Example 1-5: Women's Health Survey (Descriptive Statistics) Section

Let us take a look at an example. In 1985, the USDA commissioned a study of women’s nutrition. Nutrient intake was measured for a random sample of 737 women aged 25-50 years. The following variables were measured:

  • Calcium(mg)
  • Iron(mg)
  • Protein(g)
  • Vitamin A(μg)
  • Vitamin C(mg)

Download the data file: nutrient.txt

Using Technology

Using SAS

We will use the SAS program called to carry out the calculations that we would like to see.

The lines of this program are saved in a simple text file with a .sas file extension. If you have SAS installed on the machine on which you have download this file, it should launch SAS and open the program within the SAS application. Marking up a print out of the SAS program is also a good strategy for learning how this program is put together.

Download the SAS file: nutrient.sas

The video will walk you through the various parts of the code.

 

The first part of this SAS output, (download below), is the results of the Means Procedure - proc means. Because the SAS output is usually a relatively long document, printing these pages of output out and marking them with notes is highly recommended if not required!

Example: Nutrient Intake Data - Descriptive Statistics

The MEANS Procedure

Variable N Mean Std Dev Minimum Maximum
Calcium 737 624.0492537 397.2775401 7.4400000 2866.44
Iron 737 11.1298996 5.9841905 0 58.6680000
Protein 737 65.8034410 30.5757564 0 251.0120000
A 737 839.3653460 1633.54 0 34434.27
C 737 78.9284464 73.5952721 0 433.3390000


Download the SAS Output file: nutrient2.lst

The first column of the Means Procedure table above gives the variable name. The second column reports the sample size. This is then followed by the sample means (third column) and the sample standard deviations (fourth column) for each variable. I have copied these values into the table below. I have also rounded these numbers a bit to make them easier to use for this example.

Using Minitab

Click on the graphic or the link below to walk through how to find descriptive statistics for the Women's Nutrition dataset in Minitab.

Video: Descriptive Statistics in Minitab


Analysis

 

Descriptive Statistics

A summary of the descriptive statistics is given here for ease of reference.

Variable Mean Standard Deviation
Calcium 624.0 mg 397.3 mg
Iron 11.1 mg 6.0 mg
Protein 65.8 mg 30.6 mg
Vitamin A 839.6 μg 1634.0 μg
Vitamin C 78.9 mg 73.6 mg

Notice that the standard deviations are large relative to their respective means, especially for Vitamin A & C. This would indicate a high variability among women in nutrient intake. However, whether the standard deviations are relatively large or not, will depend on the context of the application. Skill in interpreting the statistical analysis depends very much on the researcher's subject matter knowledge.

The variance-covariance matrix is also copied into the matrix below.

\(S = \left(\begin{array}{RRRRR}157829.4 & 940.1 & 6075.8 & 102411.1 & 6701.6 \\ 940.1 & 35.8 & 114.1 & 2383.2 & 137.7 \\ 6075.8 & 114.1 & 934.9 & 7330.1 & 477.2 \\ 102411.1 & 2383.2 & 7330.1 & 2668452.4 & 22063.3 \\ 6701.6 & 137.7 & 477.2 & 22063.3 & 5416.3 \end{array}\right)\)

 

Interpretation

Because this covariance is positive, we see that calcium intake tends to increase with increasing iron intake. The strength of this positive association can only be judged by comparing s12 to the product of the sample standard deviations for calcium and iron. This comparison is most readily accomplished by looking at the sample correlation between the two variables.

  • The sample variances are given by the diagonal elements of S. For example, the variance of iron intake is \(s_{2}^{2}\). 35. 8 mg2.
  • The covariances are given by the off-diagonal elements of S. For example, the covariance between calcium and iron intake is \(s_{12}\)= 940. 1.
  • Note that, the covariances are all positive, indicating that the daily intake of each nutrient increases with increased intake of the remaining nutrients.
 

Sample Correlations

The sample correlations are included in the table below.

  Calcium Iron Protein Vit. A Vit. C
Calcium 1.000 0.395 0.500 0.158 0.229
Iron 0.395 1.000 0.623 0.244 0.313
Protein 0.500 0.623 1.000 0.147 0.212
Vit. A 0.158 0.244 0.147 1.000 0.184
Vit. C 0.229 0.313 0.212 0.184 1.000

Here we can see that the correlation between each of the variables and themselves are all equal to one, and the off-diagonal elements give the correlation between each of the pairs of variables.

Generally, we look for the strongest correlations first. The results above suggest that protein, iron, and calcium are all positively associated. Each of these three nutrients intake increases with increasing values of the remaining two.

The coefficient of determination is another measure of association and is simply equal to the square of the correlation. For example, in this case, the coefficient of determination between protein and iron is \((0.623)^2\) or about 0.388.

\[r^2_{23} = 0.62337^2 = 0.38859\]

This says that about 39% of the variation in iron intake is explained by protein intake. Or, conversely, 39% of the protein intake is explained by the variation in the iron intake. Both interpretations are equivalent.