7.1.9 - Example: Spouse Data

Example 7-9: Spouse Data (Paired Hotelling's) Section

Download the text file containing the data here: spouse.csv

Example
Example

The Spouse Data may be analyzed using the SAS program as shown below:

Download the SAS Program: spouse.sas

Explore the code below to see how to compute the Paired Hotelling's \(T^2\) using the SAS statistical software application.

Note: In the upper right-hand corner of the code block you will have the option of copying () the code to your clipboard or downloading () the file to your computer.

options ls=78;
title "Paired Hotelling's T-Square";

 /* The differences 'd1' through 'd4' are defined
  * and added to the data set.
  */

data spouse;
  infile "D:\Statistics\STAT 505\data\spouse.csv" firstobs=2 delimiter=',';
  input h1 h2 h3 h4 w1 w2 w3 w4;
  d1=h1-w1;
  d2=h2-w2;
  d3=h3-w3;
  d4=h4-w4;
  run;

proc print data=spouse;
  run;

 /* The iml code below defines and executes the 'hotel' module 
  * for calculating the one-sample Hotelling T2 test statistic. 
  * The calculations are based on the differences, which is why 
  * there is a single null vector consisting of only 0s.
  * The commands between 'start' and 'finish' define the 
  * calculations of the module for an input vector 'x'.
  * The 'use' statement makes the 'spouse' data set available, from 
  * which the difference variables are taken. The difference variables 
  * are then read into the vector 'x' before the 'hotel' module is called.
  */

proc iml;
  start hotel;
    mu0={0, 0, 0, 0};
    one=j(nrow(x),1,1);
    ident=i(nrow(x));
    ybar=x`*one/nrow(x);
    s=x`*(ident-one*one`/nrow(x))*x/(nrow(x)-1.0);
    print mu0 ybar;
    print s;
    t2=nrow(x)*(ybar-mu0)`*inv(s)*(ybar-mu0);
    f=(nrow(x)-ncol(x))*t2/ncol(x)/(nrow(x)-1);
    df1=ncol(x);
    df2=nrow(x)-ncol(x);
    p=1-probf(f,df1,df2);
    print t2 f df1 df2 p;
  finish;
  use spouse;
  read all var{d1 d2 d3 d4} into x;
  run hotel;

The downloadable output is given by spouse.lst.

The first page of the output just gives a list of the raw data. You can see all of the data are numbers between 1 and 5. If you look at them closely, you can see that they are mostly 4's and 5's, a few 3's. I don't think we see a 1 in there are all.

You can also see the columns for d1 through d4, and you should be able to confirm that they are indeed equal to the differences between the husbands' and wives' responses to the four questions.

The second page of the output gives the results of the iml procedure. First, it gives the hypothesized values of the population mean under the null hypothesis. In this case, it is just a column of zeros. The sample means of the differences are given in the next column. So the mean of the differences between the husband and wife's response to the first question is 0.0666667. This is also copied into the table below. The differences for the next three questions follow.

Following the sample mean vector is the sample variance-covariance matrix. The diagonal elements give the sample variances for each of the questions. So, for example, the sample variance for the first question is 0.8229885, which we have rounded off to 0.8230 and copied into the table below as well. The second diagonal element gives the sample variance for the second question, and so on.

The results of Hotelling's T-square statistic are given at the bottom of the output page.

Computing the paired Hotelling's T2

To calculate the paired Hotelling T2 statistic:

Analysis

The sample variance-covariance matrix from the output of the SAS program:

\(\mathbf{S} =\left(\begin{array}{rrrr}0.8230 & 0.0782 & -0.0138 & -0.0598 \\ 0.0782 & 0.8092 & -0.2138 & -0.1563 \\ -0.0138 & -0.2138 & 0.5621 & 0.5103 \\ -0.0598 & -0.1563 & 0.5103 & 0.6023 \end{array}\right)\)

Sample means and variances of the differences in responses between the husbands and wives.

Question	Mean \((\bar {y})\)	Variance \((s_{Y}^{2})\)
1	0.0667	0.8230
2	-0.1333	0.8092
3	-0.3000	0.5621
4	-0.1333	0.6023

Here we have \(T^{2}\) = 13.13 with a corresponding F-value of 2.94, with 4 and 26 degrees of freedom. The 4 corresponds with the number of questions asked of each couple. The 26 comes from subtracting the sample size of 30 couples minus the 4 questions. The p-value for the test is 0.039.

The results of our test are that we can reject the null hypothesis that the mean difference between husband and wife responses is equal to zero.

Conclusion

Husbands do not respond to the questions in the same way as their wives (\(T^{2}\)= 13.13; F = 2.94; d.f. = 4, 26; p = 0.0394).

This indicates that husbands respond differently to at least one of the questions from their wives. It could be one question or more than one question.

The next step is to assess on which question the husband and wives differ in their responses. This step of the analysis will involve the computation of confidence intervals.