1.6 - Example: Generalized Variance

1.6 - Example: Generalized Variance

Example 1-7: Woman's Health Survey (Generalized Variance)

Find and interpret the generalized variance for the Women's Health Survey data.

Using Technology

The generalized variance for the Women's Health Survey data can be calculated using the SAS program below.

Download the data file here: nutrient.csv

 

Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.

options ls=78;   /*This sets the max number of lines per page to 78.*/
title "Example: Nutrient Intake Data - Generalized Variance";   /*This sets a title that will 
appear on each page of the output until it's changed.*/
data nutrient;   /*This defines a data set called 'nutrient'.*/
  infile "D:\Statistics\STAT 505\data\nutrient.csv" firstobs=2 delimiter=',';   /*SAS will look in this path for the 
  nutrient.csv file.*/
  input id calcium iron protein a c;   /*This is where we provide names for the variables 
  in order of the columns in the data set. If any were categorical (not the case here), 
  we would need to put a '$' character after its name.*/
  run;   
proc iml;   /*The iml procedure allows for many general calculations to be made, including 
matrix operations.*/
  start genvar;   /*This defines a SAS module that can be called to compute the 
  generalized variance. The lines of code below are executed when 'genvar' is called, 
  and both the sample covariance matrix and the generalized variance are printed.*/
    one=j(nrow(x),1,1);   /*This defines a column vector of 1s. The size is determined by 
    'x', which is a variable that is defined outside the module below.*/
    ident=i(nrow(x));   /*This creates an identity matrix with the same number of rows 
    as x.*/
    s=x`*(ident-one*one`/nrow(x))*x/(nrow(x)-1.0);   /*This is the sample covariance 
    matrix, which is an unbiased estimate of the population covariance matrix.*/
    genvar=det(s);   /*The generalized variance is the determinant of the sample 
    covariance matrix.*/
    print s genvar;   /*This is the statement that prints both the sample covariance 
    matrix and the generalized variance.*/
  finish;   /*This ends the genmod module definition. The module hasn't run yet and 
  won't be called until we define the 'x' argument below.*/
  use nutrient;   /*This makes the variables from the 'nutrient' data set available 
  for use in this iml environment.*/
  read all var{calcium iron protein a c} into x;   /*This creates a vector x 
  consisting of the variables specified. This vector is what will be used in the 
  'genvar' module defined above.*/
  run genvar;   /*This statements calls the 'genvar' module, which we defined above.*/

Generalized Variance using Minitab

  1. Download the 'Determat.mac’ macro file and save it to your computer.
  2. File > Run Script, and then choose 'Minitab Macro' for type. Then choose ‘OK’.
  3. Stat > Basic Statistics > Covariance
    1. Highlight and select C3, C4, and C6 and choose ‘Select’ to move these three variables into the window on the right. Only these variables are chosen for this particular example because if all six variables are used, the value of the generalized variance is too large to be displayed.
    2. Check the box for ‘Store matrix’.
    3. Select ‘OK’. No results are displayed at this point.
  4. Data > Display Data
    1. Highlight and select M1 and click ‘Select’ to move it into the window on the right.
    2. Select ‘OK’ to display the sample covariance matrix.
  5. View > Command Line/History to show the command line window on the right side.
    1. In the command line window, type ‘%Determat M1’ without quotes.
    2. Select ‘Run’ near the lower-right corner of the command line window. The generalized variance is displayed in the data display area.

Analysis

The output from the programs report the sample variance/covariance matrix.

Example: Nutrient Intake Data - Generalized variance

S         GENVAR

157829.44

940.08944 6075.8163

102411.13

6701.616 2.83E19
940.08944 35.810536 114.05803 2383.1534 137.67199  
6075.8163 114.05803 934.87688 7330.0515 477.19978  
102411.13 2383.1543 7330.0515 2668452.4 22063.249  
6701.616 137.67199 477.19978 22063.249 5416.2641  

You should compare this output with the sample variance/covariance matrix output obtained from the corr procedure from our last program, nutrient2. You will see that we have the exact same numbers that were presented before. The generalized variance is that single entry in the far upper right-hand corner. Here we see that the generalized variance is:

\[|S| = 2.83 \times 10^{19}\]

Interpretation

In terms of interpreting the generalized variance, the larger the generalized variance the more dispersed the data are. Note that the volume of space occupied by the cloud of data points is going to be proportional to the square root of the generalized variance.

In this example...

\[\sqrt{|S|} = 5.37 \times 10^9\]

This represents a very large volume of space. Again, the interpretation of this particular number depends largely on subject matter knowledge. In this case, we can not say if this is a particularly large number or not unless we know more about women's nutrition.


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility