7.1.15 - The Two-Sample Hotelling's T-Square Test Statistic

Now we are ready to define the Two-sample Hotelling's T-Square test statistic. As in the expression below, you will note that it involves the computation of differences in the sample mean vectors. It also involves a calculation of the pooled variance-covariance matrix multiplied by the sum of the inverses of the sample size. The resulting matrix is then inverted.

\(T^2 = \mathbf{(\bar{x}_1 - \bar{x}_2)}^T\{\mathbf{S}_p(\frac{1}{n_1}+\frac{1}{n_2})\}^{-1} \mathbf{(\bar{x}_1 - \bar{x}_2)}\)

For large samples, this test statistic will be approximately chi-square distributed with \(p\) degrees of freedom. However, as before this approximation does not take into account the variation due to estimating the variance-covariance matrix. So, as before, we will look at transforming this Hotelling's T-square statistic into an F-statistic using the following expression.

Note! This is a function of the sample sizes of the two populations and the number of variables measured p.

\(F = \dfrac{n_1+n_2-p-1}{p(n_1+n_2-2)}T^2 \sim F_{p, n_1+n_2-p-1}\)

Under the null hypothesis, \(H_{o}\colon \mu_{1} = \mu_{2}\) this F-statistic will be F-distributed with p and \(n_{1} + n_{2} - p\) degrees of freedom. We would reject \(H_{o}\) at level \(α\) if it exceeds the critical value from the F-table evaluated at \(α\).

\(F > F_{p, n_1+n_2-p-1, \alpha}\)

Example 7-13: Swiss Bank Notes (Two-Sample Hotelling's) Section

Using SAS

The two sample Hotelling's \(T^{2}\) test can be carried out using the Swiss Bank Notes data using the SAS program as shown below:

Data file:  swiss3.txt

Download the SAS Program: swiss10.sas

Download the output: swiss10.lst.

View the video below to see how to compute the Two Sample Hotelling's \(T^2\) using the SAS statistical software application.

At the top of the first output page you see that N1 is equal to 100 indicating that we have 100 bank notes in the first sample. In this case 100 real or genuine notes.

Using Minitab

View the video below to see how to compute the Two Sample Hotelling's \(T^2\) using the Minitab statistical software application.


Analysis

The sample mean vectors are copied into the table below:

  Means
Variable Genuine Counterfeit
Length 214.969 214.823
Left Width 129.943 130.300
Right Width 129.720 130.193
Bottom Margin 8.305 10.530
Top Margin 10.168 11.133
Diagonal 141.517 139.450

The sample variance-covariance matrix for the real or genuine notes appears below:

\(S_1 = \left(\begin{array}{rrrrrr}0.150& 0.058& 0.057 &0.057&0.014&0.005\\0.058&0.133&0.086&0.057&0.049&-0.043\\0.057&0.086&0.126&0.058&0.031&-0.024\\0.057&0.057&0.058&0.413&-0.263&-0.000\\0.014&0.049&0.031&-0.263&0.421&-0.075\\0.005&-0.043&-0.024&-0.000&-0.075&0.200\end{array}\right)\)

The sample variance-covariance for the second sample of notes, the counterfeit note, is given below:

\(S_2 = \left(\begin{array}{rrrrrr}0.124&0.032&0.024&-0.101&0.019&0.012\\0.032&0.065&0.047&-0.024&-0.012&-0.005\\0.024&0.047&0.089&-0.019&0.000&0.034\\-0.101&-0.024&-0.019&1.281&-0.490&0.238\\ 0.019&-0.012&0.000&-0.490&0.404&-0.022\\0.012&-0.005&0.034&0.238&-0.022&0.311\end{array}\right)\)

This is followed by the pooled variance-covariance matrix for the two samples.

\(S_p = \left(\begin{array}{rrrrrr}0.137&0.045&0.041&-0.022&0.017&0.009\\0.045&0.099&0.066&0.016&0.019&-0.024\\0.041&0.066&0.108&0.020&0.015&0.005\\-0.022&0.016&0.020&0.847&-0.377&0.119\\0.017&0.019&0.015&-0.377&0.413&-0.049\\0.009&-0.024&0.005&0.119&-0.049&0.256\end{array}\right)\)

The two-sample Hotelling's \(T^{2}\) statistic is 2412.45. The F-value is about 391.92 with 6 and 193 degrees of freedom.  The p-value is close to 0 and so we will write this as \(< 0.0001\).

In this case we can reject the null hypothesis that the mean vector for the counterfeit notes equals the mean vector for the genuine notes given the evidence as usual: (\(T_{2} = 2412.45\); \(F = 391.92\); \(d. f. = 6, 193\); \(p< 0.0001\))

 

Conclusion

The counterfeit notes can be distinguished from the genuine notes on at least one of the measurements.

After concluding that the counterfeit notes can be distinguished from the genuine notes the next step in our analysis is to determine upon which variables they are different.