The following considers a test for equality of the population mean vectors when the variance-covariance matrices are not equal.
Here we will consider the modified Hotelling's T-square test statistic given in the expression below:
\(T^2 = \mathbf{(\bar{x}_1-\bar{x}_2)}'\left\{\dfrac{1}{n_1}\mathbf{S}_1+\dfrac{1}{n_2}\mathbf{S}_2\right\}^{-1}\mathbf{(\bar{x}_1-\bar{x}_2)}\)
Again, this is a function of the differences between the sample means for the two populations. Instead of being a function of the pooled variance-covariance matrix, we can see that the modified test statistic is written as a function of the sample variance-covariance matrix, \(\mathbf{S}_{1}\), for the first population and the sample variance-covariance matrix, \(\mathbf{S}_{2}\), for the second population. It is also a function of the sample sizes \(n_{1}\) and \(n_{2}\).
For large samples, that is if both samples are large, \(T^{2}\) is approximately chi-square distributed with p d.f. We will reject \(H_{0}\): \(\boldsymbol{\mu}_{1}\) = \(\boldsymbol{\mu}_{2}\) at level \(α\) if \(T^{2}\) exceeds the critical value from the chi-square table with p d.f. evaluated at level \(α\).
\(T^2 > \chi^2_{p, \alpha}\)
For small samples, we can calculate an F transformation as before using the formula below.
\(F = \dfrac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2\textbf{ } \overset{\cdot}{\sim}\textbf{ } F_{p,\nu}\)
This formula is a function of sample sizes \(n_{1}\) and \(n_{2}\), and the number of variables p. Under the null hypothesis, this will be F-distributed with p and approximately ν degrees of freedom, where 1 divided by ν is given by the formula below:
\( \dfrac{1}{\nu} = \sum_{i=1}^{2}\frac{1}{n_i-1} \left\{ \dfrac{\mathbf{(\bar{x}_1-\bar{x}_2)}'\mathbf{S}_T^{-1}(\dfrac{1}{n_i}\mathbf{S}_i)\mathbf{S}_T^{-1}\mathbf{(\bar{x}_1-\bar{x}_2)}}{T^2} \right\} ^2 \)
This involves summing over the two samples of banknotes, a function of the number of observations of each sample, the difference in the sample mean vectors, the sample variance-covariance matrix for each of the individual samples, as well as a new matrix \(\mathbf{S}_{T}\) which is given by the expression below:
\(\mathbf{S_T} = \dfrac{1}{n_1}\mathbf{S_1} + \dfrac{1}{n_2}\mathbf{S}_2\)
We will reject \(H_{0}\) \colon \(\mu_{1}\) = \(\mu_{2}\) at level \(α\) if the F-value exceeds the critical value from the F-table with p and ν degrees of freedom evaluated at level \(α\).
\(F > F_{p,\nu, \alpha}\)
A reference for this particular test is given in Seber, G.A.F. 1984. Multivariate Observations. Wiley, New York.
This modified version of Hotelling's T-square test can be carried out on the Swiss Bank Notes data using the SAS program below:
Download the SAS program here: swiss16.sas
View the video explanation of the SAS code.Note: In the upper right-hand corner of the code block you will have the option of copying ( ) the code to your clipboard or downloading ( ) the file to your computer.
options ls=78;
title "2-Sample Hotellings T2 - Swiss Bank Notes (unequal variances)";
data swiss;
infile "D:\Statistics\STAT 505\data\swiss3.csv" firstobs=2 delimeter=',';
input type $ length left right bottom top diag;
run;
/* The iml code below defines and executes the 'hotel2m' module
* for calculating the two-sample Hotelling T2 test statistic,
* where here the sample covariances are not pooled.
* The commands between 'start' and 'finish' define the
* calculations of the module for two input vectors 'x1' and 'x2',
* which have the same variables but correspond to two separate groups.
* Note that s1 and s2 are not pooled in these calculations, and the
* resulting degrees of freedom are considerably more involved.
* The 'use' statement makes the 'swiss' data set available, from
* which all the variables are taken. The variables are then read
* separately into the vectors 'x1' and 'x2' for each group, and
* finally the 'hotel2' module is called.
*/
proc iml;
start hotel2m;
n1=nrow(x1);
n2=nrow(x2);
k=ncol(x1);
one1=j(n1,1,1);
one2=j(n2,1,1);
ident1=i(n1);
ident2=i(n2);
ybar1=x1`*one1/n1;
s1=x1`*(ident1-one1*one1`/n1)*x1/(n1-1.0);
print n1 ybar1;
print s1;
ybar2=x2`*one2/n2;
s2=x2`*(ident2-one2*one2`/n2)*x2/(n2-1.0);
st=s1/n1+s2/n2;
print n2 ybar2;
print s2;
t2=(ybar1-ybar2)`*inv(st)*(ybar1-ybar2);
df1=k;
p=1-probchi(t2,df1);
print t2 df1 p;
f=(n1+n2-k-1)*t2/k/(n1+n2-2);
temp=((ybar1-ybar2)`*inv(st)*(s1/n1)*inv(st)*(ybar1-ybar2)/t2)**2/(n1-1);
temp=temp+((ybar1-ybar2)`*inv(st)*(s2/n2)*inv(st)*(ybar1-ybar2)/t2)**2/(n2-1);
df2=1/temp;
p=1-probf(f,df1,df2);
print f df1 df2 p;
finish;
use swiss;
read all var{length left right bottom top diag} where (type="real") into x1;
read all var{length left right bottom top diag} where (type="fake") into x2;
run hotel2m;
The output file can be downloaded here: swiss16.lst
At this time Minitab does not support this procedure.
Analysis
As before, we are given the sample sizes for each population, the sample mean vector for each population, followed by the sample variance-covariance matrix for each population.
In the large sample approximation, we find that T-square is 2412.45 with 6 degrees of freedom, (because we have 6 variables), and a p-value that is close to 0.
- When \(n_{1}\) = \(n_{2}\), the modified values for \(T^{2}\) and F are identical to the original unmodified values obtained under the assumption of homogeneous variance-covariance matrices.
- Using the large-sample approximation, our conclusions are the same as before. We find that the mean dimensions of counterfeit notes do not match the mean dimensions of genuine Swiss banknotes. \(\left( T ^ { 2 } = 2412.45 ; \mathrm { d.f. } = 6 ; p < 0.0001 \right)\).
- Under the small-sample approximation, we also find that the mean dimensions of counterfeit notes do not match the mean dimensions of genuine Swiss banknotes. \(( F = 391.92 ; \mathrm { d } . \mathrm { f } . = 6,193 ; p < 0.0001 )\).