##
Example 7-9: Spouse Data (Paired Hotelling's)
Section* *

Download the text file containing the data here: spouse.csv

The Spouse Data may be analyzed using the SAS program as shown below:

Download the SAS Program: spouse.sas

Explore the code below to see how to compute the Paired Hotelling's \(T^2\) using the SAS statistical software application.

**Note**: In the upper right-hand corner of the code block you will have the option of copying (* *) the code to your clipboard or downloading (* *) the file to your computer.

```
options ls=78;
title "Paired Hotelling's T-Square";
/* The differences 'd1' through 'd4' are defined
* and added to the data set.
*/
data spouse;
infile "D:\Statistics\STAT 505\data\spouse.csv" firstobs=2 delimiter=',';
input h1 h2 h3 h4 w1 w2 w3 w4;
d1=h1-w1;
d2=h2-w2;
d3=h3-w3;
d4=h4-w4;
run;
proc print data=spouse;
run;
/* The iml code below defines and executes the 'hotel' module
* for calculating the one-sample Hotelling T2 test statistic.
* The calculations are based on the differences, which is why
* there is a single null vector consisting of only 0s.
* The commands between 'start' and 'finish' define the
* calculations of the module for an input vector 'x'.
* The 'use' statement makes the 'spouse' data set available, from
* which the difference variables are taken. The difference variables
* are then read into the vector 'x' before the 'hotel' module is called.
*/
proc iml;
start hotel;
mu0={0, 0, 0, 0};
one=j(nrow(x),1,1);
ident=i(nrow(x));
ybar=x`*one/nrow(x);
s=x`*(ident-one*one`/nrow(x))*x/(nrow(x)-1.0);
print mu0 ybar;
print s;
t2=nrow(x)*(ybar-mu0)`*inv(s)*(ybar-mu0);
f=(nrow(x)-ncol(x))*t2/ncol(x)/(nrow(x)-1);
df1=ncol(x);
df2=nrow(x)-ncol(x);
p=1-probf(f,df1,df2);
print t2 f df1 df2 p;
finish;
use spouse;
read all var{d1 d2 d3 d4} into x;
run hotel;
```

The downloadable output is given by spouse.lst.

The first page of the output just gives a list of the raw data. You can see all of the data are numbers between 1 and 5. If you look at them closely, you can see that they are mostly 4's and 5's, a few 3's. I don't think we see a 1 in there are all.

You can also see the columns for d1 through d4, and you should be able to confirm that they are indeed equal to the differences between the husbands' and wives' responses to the four questions.

The second page of the output gives the results of the iml procedure. First, it gives the hypothesized values of the population mean under the null hypothesis. In this case, it is just a column of zeros. The sample means of the differences are given in the next column. So the mean of the differences between the husband and wife's response to the first question is 0.0666667. This is also copied into the table below. The differences for the next three questions follow.

Following the sample mean vector is the sample variance-covariance matrix. The diagonal elements give the sample variances for each of the questions. So, for example, the sample variance for the first question is 0.8229885, which we have rounded off to 0.8230 and copied into the table below as well. The second diagonal element gives the sample variance for the second question, and so on.

The results of Hotelling's *T*-square statistic are given at the bottom of the output page.

### Computing the paired Hotelling's T2

To calculate the paired Hotelling T2 statistic:

**Open**the ‘spouse’ data set in a new worksheet.**Name the columns**h1, h2, h3, h4, w1, w2, w3, and w4, from left to right.**Name new columns**diff1, diff2, diff3, and diff4.**Calc > Calculator****Highlight and select**diff1 to move it to the Store result window.- In the Expression window,
**enter ‘h1’ - ‘w1’**. - Choose '
**OK**'. The first difference is created in the worksheet.

**Repeat step 4.**for each of the other 3 differences.**Stat > Basic Statistics > Store Descriptive Statistics****Highlight and select**all 4 difference variables (diff1 through diff4) to move them to the Variables window.- Under Statistics,
**choose ‘Mean’**, and then ‘**OK**’. - Choose ‘
**OK**’ again. The 4 different means are displayed in new columns in the worksheet.

**Data > Transpose Columns****Highlight and select**the 4 column names with the means from the step above.**Choose ‘After last column in use**’ then ‘**OK'**. The means are displayed in a single column in the worksheet.

**Name the new column of the numeric means ‘means’**for the remaining steps.**Data > Copy > Columns to Matrix****Highlight and select****‘means’**for the Copy from the columns window.**Enter the ‘M1’**in the In current worksheet window. Then choose ‘**OK**’.

**Calc > Matrices > Transpose****Highlight and select ‘M1’**in the 'Transpose from' window and**enter ‘M2’**in the 'Store result' window.- Choose ‘
**OK**’.

**Stat > Basic Statistics > Covariance****Highlight and select**all 4 difference variables (diff1 through diff4) to move them to the Variables window.**Check the box****to Store matrix**and then choose ‘**OK**’. This will store the covariance matrix in a new variable ‘M3’.

**Calc > Matrices > Invert****Highlight and select ‘M3’**to move it to the Invert from the window.**Enter ‘M4’**in the 'Store result in' window and choose ‘**OK**’. This will store the inverted covariance matrix in a new variable ‘M4’.

**Calc > Matrices > Arithmetic****Choose Multiply**and**enter M2 and M4**, respectively in the two windows.**Enter ‘M5’**as the name in the Store result window and then**‘OK’**.

**Calc > Matrices > Arithmetic****Choose Multiply**and enter**M5 and M1**, respectively in the two windows.**Enter ‘M6’**as the name in the Store result window and then ‘**OK’**. The answer 0.4376 is displayed in the results window.

**Calc > Calculator****Enter C16**(or any unused column name) in the Store result window.- In the Expression window,
**enter 0.4376 * 30 (this is the answer from 14b times the sample size)**. Choose**‘OK’**. The value of the T2 statistic is displayed in the worksheet under ‘C16’.

### Analysis

The sample variance-covariance matrix from the output of the SAS program:

\(\mathbf{S} =\left(\begin{array}{rrrr}0.8230 & 0.0782 & -0.0138 & -0.0598 \\ 0.0782 & 0.8092 & -0.2138 & -0.1563 \\ -0.0138 & -0.2138 & 0.5621 & 0.5103 \\ -0.0598 & -0.1563 & 0.5103 & 0.6023 \end{array}\right)\)

Sample means and variances of the differences in responses between the husbands and wives.

Question | Mean \((\bar {y})\) | Variance \((s_{Y}^{2})\) |
---|---|---|

1 | 0.0667 | 0.8230 |

2 | -0.1333 | 0.8092 |

3 | -0.3000 | 0.5621 |

4 | -0.1333 | 0.6023 |

Here we have \(T^{2}\) = 13.13 with a corresponding *F*-value of 2.94, with 4 and 26 degrees of freedom. The 4 corresponds with the number of questions asked of each couple. The 26 comes from subtracting the sample size of 30 couples minus the 4 questions. The *p*-value for the test is 0.039.

The results of our test are that we can reject the null hypothesis that the mean difference between husband and wife responses is equal to zero.

### Conclusion

Husbands do not respond to the questions in the same way as their wives (\(T^{2}\)= 13.13; *F* = 2.94; *d.f.* = 4, 26; *p* = 0.0394).

This indicates that husbands respond differently to at least one of the questions from their wives. It could be one question or more than one question.

The next step is to assess on which question the husband and wives differ in their responses. This step of the analysis will involve the computation of confidence intervals.