Example 1-7: Woman's Health Survey (Generalized Variance) Section
Find and interpret the generalized variance for the Women's Health Survey data.
Using Technology
The generalized variance for the Women's Health Survey data can be calculated using the SAS program below.
Download the data file here: nutrient.csv
Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
options ls=78; /*This sets the max number of lines per page to 78.*/
title "Example: Nutrient Intake Data - Generalized Variance"; /*This sets a title that will
appear on each page of the output until it's changed.*/
data nutrient; /*This defines a data set called 'nutrient'.*/
infile "D:\Statistics\STAT 505\data\nutrient.csv" firstobs=2 delimiter=','; /*SAS will look in this path for the
nutrient.csv file.*/
input id calcium iron protein a c; /*This is where we provide names for the variables
in order of the columns in the data set. If any were categorical (not the case here),
we would need to put a '$' character after its name.*/
run;
proc iml; /*The iml procedure allows for many general calculations to be made, including
matrix operations.*/
start genvar; /*This defines a SAS module that can be called to compute the
generalized variance. The lines of code below are executed when 'genvar' is called,
and both the sample covariance matrix and the generalized variance are printed.*/
one=j(nrow(x),1,1); /*This defines a column vector of 1s. The size is determined by
'x', which is a variable that is defined outside the module below.*/
ident=i(nrow(x)); /*This creates an identity matrix with the same number of rows
as x.*/
s=x`*(ident-one*one`/nrow(x))*x/(nrow(x)-1.0); /*This is the sample covariance
matrix, which is an unbiased estimate of the population covariance matrix.*/
genvar=det(s); /*The generalized variance is the determinant of the sample
covariance matrix.*/
print s genvar; /*This is the statement that prints both the sample covariance
matrix and the generalized variance.*/
finish; /*This ends the genmod module definition. The module hasn't run yet and
won't be called until we define the 'x' argument below.*/
use nutrient; /*This makes the variables from the 'nutrient' data set available
for use in this iml environment.*/
read all var{calcium iron protein a c} into x; /*This creates a vector x
consisting of the variables specified. This vector is what will be used in the
'genvar' module defined above.*/
run genvar; /*This statements calls the 'genvar' module, which we defined above.*/
Generalized Variance using Minitab
- Download the 'Determat.mac’ macro file and save it to your computer.
- File > Run Script, and then choose 'Minitab Macro' for type. Then choose ‘OK’.
- Stat > Basic Statistics > Covariance
- Highlight and select C3, C4, and C6 and choose ‘Select’ to move these three variables into the window on the right. Only these variables are chosen for this particular example because if all six variables are used, the value of the generalized variance is too large to be displayed.
- Check the box for ‘Store matrix’.
- Select ‘OK’. No results are displayed at this point.
- Data > Display Data
- Highlight and select M1 and click ‘Select’ to move it into the window on the right.
- Select ‘OK’ to display the sample covariance matrix.
- View > Command Line/History to show the command line window on the right side.
- In the command line window, type ‘%Determat M1’ without quotes.
- Select ‘Run’ near the lower-right corner of the command line window. The generalized variance is displayed in the data display area.
Analysis
The output from the programs report the sample variance/covariance matrix.
S | GENVAR | ||||
---|---|---|---|---|---|
157829.44 |
940.08944 | 6075.8163 |
102411.13 |
6701.616 | 2.83E19 |
940.08944 | 35.810536 | 114.05803 | 2383.1534 | 137.67199 | |
6075.8163 | 114.05803 | 934.87688 | 7330.0515 | 477.19978 | |
102411.13 | 2383.1543 | 7330.0515 | 2668452.4 | 22063.249 | |
6701.616 | 137.67199 | 477.19978 | 22063.249 | 5416.2641 |
You should compare this output with the sample variance/covariance matrix output obtained from the corr procedure from our last program, nutrient2. You will see that we have the exact same numbers that were presented before. The generalized variance is that single entry in the far upper right-hand corner. Here we see that the generalized variance is:
\[|S| = 2.83 \times 10^{19}\]
Interpretation
In terms of interpreting the generalized variance, the larger the generalized variance the more dispersed the data are. Note that the volume of space occupied by the cloud of data points is going to be proportional to the square root of the generalized variance.
In this example...
\[\sqrt{|S|} = 5.37 \times 10^9\]
This represents a very large volume of space. Again, the interpretation of this particular number depends largely on subject matter knowledge. In this case, we can not say if this is a particularly large number or not unless we know more about women's nutrition.