1.4 - Example: Descriptive Statistics
1.4 - Example: Descriptive StatisticsExample 1-5: Women's Health Survey (Descriptive Statistics)
Let us take a look at an example. In 1985, the USDA commissioned a study of women’s nutrition. Nutrient intake was measured for a random sample of 737 women aged 25-50 years. The following variables were measured:
- Calcium(mg)
- Iron(mg)
- Protein(g)
- Vitamin A(μg)
- Vitamin C(mg)
Download the data file: nutrient.txt
Using Technology
Using SAS
We will use the SAS program called to carry out the calculations that we would like to see.
The lines of this program are saved in a simple text file with a .sas file extension. If you have SAS installed on the machine on which you have download this file, it should launch SAS and open the program within the SAS application. Marking up a print out of the SAS program is also a good strategy for learning how this program is put together.
Download the SAS file: nutrient.sas
The video will walk you through the various parts of the code.
The first part of this SAS output, (download below), is the results of the Means Procedure - proc means. Because the SAS output is usually a relatively long document, printing these pages of output out and marking them with notes is highly recommended if not required!
Variable | N | Mean | Std Dev | Minimum | Maximum |
---|---|---|---|---|---|
Calcium | 737 | 624.0492537 | 397.2775401 | 7.4400000 | 2866.44 |
Iron | 737 | 11.1298996 | 5.9841905 | 0 | 58.6680000 |
Protein | 737 | 65.8034410 | 30.5757564 | 0 | 251.0120000 |
A | 737 | 839.3653460 | 1633.54 | 0 | 34434.27 |
C | 737 | 78.9284464 | 73.5952721 | 0 | 433.3390000 |
Download the SAS Output file: nutrient2.lst
The first column of the Means Procedure table above gives the variable name. The second column reports the sample size. This is then followed by the sample means (third column) and the sample standard deviations (fourth column) for each variable. I have copied these values into the table below. I have also rounded these numbers a bit to make them easier to use for this example.
Using Minitab
Click on the graphic or the link below to walk through how to find descriptive statistics for the Women's Nutrition dataset in Minitab.
Video: Descriptive Statistics in Minitab
Analysis
Descriptive Statistics
A summary of the descriptive statistics is given here for ease of reference.
Variable | Mean | Standard Deviation |
Calcium | 624.0 mg | 397.3 mg |
Iron | 11.1 mg | 6.0 mg |
Protein | 65.8 mg | 30.6 mg |
Vitamin A | 839.6 μg | 1634.0 μg |
Vitamin C | 78.9 mg | 73.6 mg |
Notice that the standard deviations are large relative to their respective means, especially for Vitamin A & C. This would indicate a high variability among women in nutrient intake. However, whether the standard deviations are relatively large or not, will depend on the context of the application. Skill in interpreting the statistical analysis depends very much on the researcher's subject matter knowledge.
The variance-covariance matrix is also copied into the matrix below.
\(S = \left(\begin{array}{RRRRR}157829.4 & 940.1 & 6075.8 & 102411.1 & 6701.6 \\ 940.1 & 35.8 & 114.1 & 2383.2 & 137.7 \\ 6075.8 & 114.1 & 934.9 & 7330.1 & 477.2 \\ 102411.1 & 2383.2 & 7330.1 & 2668452.4 & 22063.3 \\ 6701.6 & 137.7 & 477.2 & 22063.3 & 5416.3 \end{array}\right)\)
Interpretation
Because this covariance is positive, we see that calcium intake tends to increase with increasing iron intake. The strength of this positive association can only be judged by comparing s_{12} to the product of the sample standard deviations for calcium and iron. This comparison is most readily accomplished by looking at the sample correlation between the two variables.
- The sample variances are given by the diagonal elements of S. For example, the variance of iron intake is \(s_{2}^{2}\). 35. 8 mg^{2}.
- The covariances are given by the off-diagonal elements of S. For example, the covariance between calcium and iron intake is \(s_{12}\)= 940. 1.
- Note that, the covariances are all positive, indicating that the daily intake of each nutrient increases with increased intake of the remaining nutrients.
Sample Correlations
The sample correlations are included in the table below.
Calcium | Iron | Protein | Vit. A | Vit. C | |
Calcium | 1.000 | 0.395 | 0.500 | 0.158 | 0.229 |
Iron | 0.395 | 1.000 | 0.623 | 0.244 | 0.313 |
Protein | 0.500 | 0.623 | 1.000 | 0.147 | 0.212 |
Vit. A | 0.158 | 0.244 | 0.147 | 1.000 | 0.184 |
Vit. C | 0.229 | 0.313 | 0.212 | 0.184 | 1.000 |
Here we can see that the correlation between each of the variables and themselves are all equal to one, and the off-diagonal elements give the correlation between each of the pairs of variables.
Generally, we look for the strongest correlations first. The results above suggest that protein, iron, and calcium are all positively associated. Each of these three nutrients intake increases with increasing values of the remaining two.
The coefficient of determination is another measure of association and is simply equal to the square of the correlation. For example, in this case, the coefficient of determination between protein and iron is \((0.623)^2\) or about 0.388.
\[r^2_{23} = 0.62337^2 = 0.38859\]
This says that about 39% of the variation in iron intake is explained by protein intake. Or, conversely, 39% of the protein intake is explained by the variation in the iron intake. Both interpretations are equivalent.