13.3. Test for Relationship Between Canonical Variate Pairs

Let's first determine if there is any relationship between the two sets of variables at all. Perhaps the two sets of variables are completely unrelated to one another and independent!

To test for independence between the Sales Performance and the Test Score variables, first, consider a multivariate multiple regression model where we predict the Sales Performance variables from the Test Score variables.  In this general case, we have p multiple regressions, each multiple regression predicting one of the variables in the first group ( X variables) from the q variables in the second group (Y variables).

\begin{align} X_1 & =  \beta_{10} + \beta_{11}Y_1 +\beta_{12}Y_2 + \dots +\beta_{1q}Y_q + \epsilon_1 \\ X_2 & =  \beta_{20}+ \beta_{21}Y_1 + \beta_{22}Y_2 + \dots +\beta_{2q}Y_q + \epsilon_2 \\  &  \vdots \\ X_p & =  \beta_{p0} + \beta_{p1}Y_1 + \beta_{p2}Y_2 + \dots + \beta_{pq}Y_q + \epsilon_p \end{align}

In our example, we have multiple regressions predicting the p = 3 sales variables from the q = 4 test score variables. We wish to test the null hypothesis that these regression coefficients (except for the intercepts) are all equal to zero. This would be equivalent to the null hypothesis that the first set of variables is independent of the second set of variables.

\(H_0\colon \beta_{ij} = 0;\)  \( i = 1,2, \dots, p; j = 1,2, \dots, q\)

This is carried out using Wilks lambda. The results of this are found on page 1 of the output of the SAS Program.

Test of H0: The canonical correlations in the current row and all that follow are zero

  Likelihood
Ratio
Approximate
F Value
Num DF Den DF Pr > F
1 0.00214847 87.39 12 114.06 <.0001
2 0.19524127 18.53 6 88 <.0001
3 0.85284669 3.88 2 45 0.0278

SAS reports Wilks lambda \(\Lambda = 0.00215 ; F = 87.39 ; d . f = 12,114 ; p < 0.0001\). Wilks lambda is a ratio of two variance-covariance matrices (raised to a certain power).  If the values of these statistics are large (small p-value), then we reject the null hypothesis.  In our example, we reject the null hypothesis that there is no relationship between the two sets of variables and conclude that the two sets of variables are dependent. Note also that the above null hypothesis is also equivalent to testing the null hypothesis that all p canonical variate pairs are uncorrelated, or

\(H_0\colon \rho^*_1 = \rho^*_2 = \dots = \rho^*_p = 0 \)

Because Wilks lambda is significant and the canonical correlations are ordered from largest to smallest, we can conclude that at least \(\rho^*_1 \ne 0\).

We may also wish to test the hypothesis that the second or the third canonical variate pairs are correlated. We can do this in successive tests. Next, test whether the second and third canonical variate pairs are correlated...

\(H_0\colon \rho^*_2 = \rho^*_3 = 0\)

We can look again at the SAS output above. In the second row for the likelihood ratio test statistic we find \(L ^ { \prime } = 0.19524 ; F = 18.53 ; d . f = 6,88 ; p < 0.0001\). From this test we can conclude that the second canonical variate pair is correlated, \(\rho^*_2 \ne 0\).

Finally, we can test the significance of the third canonical variate pair.

\(H_0\colon \rho^*_3 = 0\)

The third row of the SAS output contains the likelihood ratio test statistic \(L ^ { \prime } = 0.8528 ; F = 3.88 ; d . f = 2,45 ; p = 0.0278\). This is also significant and so we conclude that the third canonical variate pair is correlated.

All three canonical variate pairs are significantly correlated and dependent on one another. This suggests that we may summarize all three pairs. In practice, these tests are carried out successively until you find a non-significant result. Once a non-significant result is found, you stop. If this happens with the first canonical variate pair, then there is not sufficient evidence of any relationship between the two sets of variables and the analysis may stop.

If the first pair shows significance, then you move on to the second canonical variate pair. If this second pair is not significantly correlated then stop. If it was significant you would continue to the third pair, proceeding in this iterative manner through the pairs of canonical variates testing until you find non-significant results.