Now we are ready to define the Two-sample Hotelling's T-Square test statistic. As in the expression below, you will note that it involves the computation of differences in the sample mean vectors. It also involves a calculation of the pooled variance-covariance matrix multiplied by the sum of the inverses of the sample size. The resulting matrix is then inverted.
\(T^2 = \mathbf{(\bar{x}_1 - \bar{x}_2)}^T\{\mathbf{S}_p(\frac{1}{n_1}+\frac{1}{n_2})\}^{-1} \mathbf{(\bar{x}_1 - \bar{x}_2)}\)
For large samples, this test statistic will be approximately chi-square distributed with \(p\) degrees of freedom. However, as before this approximation does not take into account the variation due to estimating the variance-covariance matrix. So, as before, we will look at transforming this Hotelling's T-square statistic into an F-statistic using the following expression.
Note! This is a function of the sample sizes of the two populations and the number of variables measured p.
\(F = \dfrac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2 \sim F_{p, n_1+n_2-p-1}\)
Under the null hypothesis, \(H_{o}\colon \mu_{1} = \mu_{2}\) this F-statistic will be F-distributed with p and \(n_{1} + n_{2} - p\) degrees of freedom. We would reject \(H_{o}\) at level \(α\) if it exceeds the critical value from the F-table evaluated at \(α\).
\(F > F_{p, n_1+n_2-p-1, \alpha}\)
Example 7-13: Swiss Bank Notes (Two-Sample Hotelling's) Section
Using SAS
The two sample Hotelling's \(T^{2}\) test can be carried out using the Swiss Bank Notes data using the SAS program as shown below:
Data file: swiss3.txt
Download the SAS Program: swiss10.sas
Download the output: swiss10.lst.
View the video below to see how to compute the Two Sample Hotelling's \(T^2\) using the SAS statistical software application.
At the top of the first output page you see that N1 is equal to 100 indicating that we have 100 bank notes in the first sample. In this case 100 real or genuine notes.
Using Minitab
View the video below to see how to compute the Two Sample Hotelling's \(T^2\) using the Minitab statistical software application.
Analysis
The sample mean vectors are copied into the table below:
Means | ||
Variable | Genuine | Counterfeit |
Length | 214.969 | 214.823 |
Left Width | 129.943 | 130.300 |
Right Width | 129.720 | 130.193 |
Bottom Margin | 8.305 | 10.530 |
Top Margin | 10.168 | 11.133 |
Diagonal | 141.517 | 139.450 |
The sample variance-covariance matrix for the real or genuine notes appears below:
\(S_1 = \left(\begin{array}{rrrrrr}0.150& 0.058& 0.057 &0.057&0.014&0.005\\0.058&0.133&0.086&0.057&0.049&-0.043\\0.057&0.086&0.126&0.058&0.031&-0.024\\0.057&0.057&0.058&0.413&-0.263&-0.000\\0.014&0.049&0.031&-0.263&0.421&-0.075\\0.005&-0.043&-0.024&-0.000&-0.075&0.200\end{array}\right)\)
The sample variance-covariance for the second sample of notes, the counterfeit note, is given below:
\(S_2 = \left(\begin{array}{rrrrrr}0.124&0.032&0.024&-0.101&0.019&0.012\\0.032&0.065&0.047&-0.024&-0.012&-0.005\\0.024&0.047&0.089&-0.019&0.000&0.034\\-0.101&-0.024&-0.019&1.281&-0.490&0.238\\ 0.019&-0.012&0.000&-0.490&0.404&-0.022\\0.012&-0.005&0.034&0.238&-0.022&0.311\end{array}\right)\)
This is followed by the pooled variance-covariance matrix for the two samples.
\(S_p = \left(\begin{array}{rrrrrr}0.137&0.045&0.041&-0.022&0.017&0.009\\0.045&0.099&0.066&0.016&0.019&-0.024\\0.041&0.066&0.108&0.020&0.015&0.005\\-0.022&0.016&0.020&0.847&-0.377&0.119\\0.017&0.019&0.015&-0.377&0.413&-0.049\\0.009&-0.024&0.005&0.119&-0.049&0.256\end{array}\right)\)
The two-sample Hotelling's \(T^{2}\) statistic is 2412.45. The F-value is about 391.92 with 6 and 193 degrees of freedom. The p-value is close to 0 and so we will write this as \(< 0.0001\).
In this case we can reject the null hypothesis that the mean vector for the counterfeit notes equals the mean vector for the genuine notes given the evidence as usual: (\(T_{2} = 2412.45\); \(F = 391.92\); \(d. f. = 6, 193\); \(p< 0.0001\))
Conclusion
The counterfeit notes can be distinguished from the genuine notes on at least one of the measurements.
After concluding that the counterfeit notes can be distinguished from the genuine notes the next step in our analysis is to determine upon which variables they are different.