The following considers a test for equality of the population mean vectors when the variance-covariance matrices are not equal.
Here we will consider the modified Hotelling's T-square test statistic given in the expression below:
\(T^2 = \mathbf{(\bar{x}_1-\bar{x}_2)}'\left\{\dfrac{1}{n_1}\mathbf{S}_1+\dfrac{1}{n_2}\mathbf{S}_2\right\}^{-1}\mathbf{(\bar{x}_1-\bar{x}_2)}\)
Again, this is a function of the differences between the sample means for the two populations. Instead of being a function of the pooled variance-covariance matrix, we can see that the modified test statistic is written as a function of the sample variance-covariance matrix, \(\mathbf{S}_{1}\), for the first population and the sample variance-covariance matrix, \(\mathbf{S}_{2}\), for the second population. It is also a function of the sample sizes \(n_{1}\) and \(n_{2}\).
For large samples, that is if both samples are large, \(T^{2}\) is approximately chi-square distributed with p d.f. We will reject \(H_{0}\) : \(\boldsymbol{\mu}_{1}\) = \(\boldsymbol{\mu}_{2}\) at level \(α\) if \(T^{2}\) exceeds the critical value from the chi-square table with p d.f. evaluated at level \(α\).
\(T^2 > \chi^2_{p, \alpha}\)
For small samples, we can calculate an F transformation as before using the formula below.
\(F = \dfrac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2\textbf{ } \overset{\cdot}{\sim}\textbf{ } F_{p,\nu}\)
This formula is a function of sample sizes \(n_{1}\) and \(n_{2}\), and the number of variables p. Under the null hypothesis this will be F-distributed with p and approximately ν degrees of freedom, where 1 divided by ν is given by the formula below:
\( \dfrac{1}{\nu} = \sum_{i=1}^{2}\frac{1}{n_i-1} \left\{ \dfrac{\mathbf{(\bar{x}_1-\bar{x}_2)}'\mathbf{S}_T^{-1}(\dfrac{1}{n_i}\mathbf{S}_i)\mathbf{S}_T^{-1}\mathbf{(\bar{x}_1-\bar{x}_2)}}{T^2} \right\} ^2 \)
This involves summing over the two samples of bank notes, a function of the number of observations of each sample, the difference in the sample mean vectors, the sample variance-covariance matrix for each of the individual samples, as well as a new matrix \(\mathbf{S}_{T}\) which is given by the expression below:
\(\mathbf{S_T} = \dfrac{1}{n_1}\mathbf{S_1} + \dfrac{1}{n_2}\mathbf{S}_2\)
We will reject \(H_{0}\) \colon \(\mu_{1}\) = \(\mu_{2}\) at level \(α\) if the F-value exceeds the critical value from the F-table with p and ν degrees of freedom evaluated at level \(α\).
\(F > F_{p,\nu, \alpha}\)
A reference for this particular test is given in: Seber, G.A.F. 1984. Multivariate Observations. Wiley, New York.
Using SAS
This modified version of Hotelling's T-square test can be carried out on the Swiss Bank Notes data using the SAS program below:
Download the SAS program here: swiss16.sas
View the video explanation of the SAS code.The output file can be downloaded here: swiss16.lst
Using Minitab
At this time Minitab does not support this procedure.
Analysis
As before, we are given the sample sizes for each population, the sample mean vector for each population, followed by the sample variance-covariance matrix for each population.
In the large sample approximation, we find that T-square is 2412.45 with 6 degrees of freedom, (because we have 6 variables), and a p-value that is close to 0.
- When \(n_{1}\) = \(n_{2}\), the modified values for \(T^{2}\) and F are identical to the original unmodified values obtained under the assumption of homogeneous variance-covariance matrices.
- Using the large-sample approximation, our conclusions are the same as before. We find that mean dimensions of counterfeit notes do not match the mean dimensions of genuine Swiss bank notes. \(\left( T ^ { 2 } = 2412.45 ; \mathrm { d.f. } = 6 ; p < 0.0001 \right)\).
- Under the small-sample approximation, we also find that mean dimensions of counterfeit notes do not match the mean dimensions of genuine Swiss bank notes. \(( F = 391.92 ; \mathrm { d } . \mathrm { f } . = 6,193 ; p < 0.0001 )\).