11.1 - Modeling Ordinal Data with Log-linear Models

Earlier in Lesson 4, we had described the ways to perform significance tests for independence and conditional independence, and to measure (linear) associations with ordinal categorical variables. For example, we focused on the CMH statistic and correlation measures for testing independence and linear associations for the example comparing job security and happiness from the general social survey data. We concluded that those who generally agree more with the statement "job security is good" have a greater odds of being happier.

	Not too happy	Pretty happy	Very happy
Not at all true	15	25	5
Not too true	21	47	21
Somewhat true	64	248	100
Very true	73	474	311

Can we answer the same questions (and more) via log-linear models?

Modeling Ordinal Data in 2-way Tables Section

Loglinear models for contingency tables, by default, treat all variables as nominal variables. If there is an ordering of the categories of the variables, this is not taken into account. That means that we could rearrange the rows and/or columns of a table and we would get the same fitted odds ratios for the data as we would be given the original ordering of the rows and/or columns.

To model ordinal data with log-linear models, we can apply some of the general ideas we saw in the analysis of ordinal data earlier in the course. That is, we typically

assign scores to the levels of our categorical variables, and
include additional parameters (which represent these scores) into a log-linear model to model the dependency between two variables.

Linear by Linear Association Model Section

This is the most common log-linear model when we have ordinal data.

Objective

Modeling the log counts by accounting for the ordering of the categories of discrete variables.

Suppose we assign numeric scores for the categories of the row variable, \(u_1\le u_2\le \cdots\le u_I\), and for the categories of the column variable, \(v_1\le \cdots\le v_J\). Then, we can model the association between the two variables.

Model Structure

\(\log(\mu_{ij})=\lambda+\lambda_i^C+\lambda_j^S+\beta u_i v_j\)

For each row \(i\), the log fitted values are a linear function of the columns. For each column \(j\), the log fitted values are a linear function of the rows.

Parameter estimates and interpretation

This model only has one more parameter than the independence model (i.e., \(\beta\)), and is in between the independence and the saturated models in terms of its complexity. We are trying to say something about the "linear association" between these two variables based on the scores that we have assigned.

If \(\beta > 0\), then the two variables are positively associated (i.e., happiness tends to go up as job security goes up).
If \(\beta<0\), then the two variables are negatively associated.
The odds ratio for any \(2 \times 2\) sub-table is a direct function of the row and column scores and \(\beta\):

\begin{align}
\theta(ij,i'j') &= \log(\mu_{ij})+\log(\mu_{i'j'})-\log(\mu_{i'j})-\log(\mu_{ij'})\\
&= \beta(u_i-u_{i'})(v_j-v_{j'})\\
\end{align}

Model Fit

We can use the deviance likelihood ratio and deviance statistics \(G^2\) as with any other log-linear model.

For our example (see ordinalLoglin.sas and the resulting output), we create two new numerical variables as seen below in the datalines statement:

    data gss;
    input jobsecok $ x happy $ y count;
    datalines;
      Notatall 1  Nottoo 1    15
        Nottoo 2  Nottoo 1    21
      Somewhat 3  Nottoo 1    64
          Very 4  Nottoo 1    73
      Notatall 1  Pretty 2    25
        Nottoo 2  Pretty 2    47
      Somewhat 3  Pretty 2   248
          Very 4  Pretty 2   474
      Notatall 1    Very 3     5
        Nottoo 2    Very 3    21
      Somewhat 3    Very 3   100
          Very 4    Very 3   311
    ;
    proc genmod data=gss order=data;
    class jobsecok happy;
    model count=jobsecok happy/link=log dist=poisson lrci type3 obstats;
    title 'Indepedence Model';
    run;
    proc genmod data=gss order=data;
    class jobsecok happy;
    model count=jobsecok happy x*y/link=log dist=poisson lrci type3 obstats;
    title 'Linear by Linear Association Model';
    run;

We observe \(G^2=3.6075\) with five degrees of freedom, which indicates that the linear by linear association model fits well and significantly better than the independence model (\(G^2=57.2404\) with six degrees of freedom). Note that these tests are relative to the saturated model, which treats both variables as entirely nominal and allows for any type of association between them, not just linear. In the case of these variables, which have four and three categories, respectively, there would be \((4-1)(3-1)=6)\) parameters needed to describe their nominal association. The linear-by-linear association model replaces them all with a single slope parameter, and the independence model removes them all entirely.

Next, since we're satisfied with the fit of the linear-by-linear association model, we would look at the test for the significance of the linear association. This would test for whether the slope parameter can be dropped and therefore has a single degree of freedom involved. We therefore can use either a likelihood test approach or the Wald test reported in the summary of parameter estimates; they should be very similar. Moreover, we expect these results to agree with those of the CMH test we performed earlier in Lesson 4. To see these results, we can find the LRT from the two deviance values:

\(G^2=57.2404-3.6075=53.633\)

The squared Wald statistic for \(\beta\) is \(z^2=7.131^2=50.85\), and finally, the CMH statistic is 47.97 from Lesson 4, all of which are relative to a chi-square distribution with one degree of freedom and demonstrate significant evidence of a linear association between job security and happiness---specifically, the numeric scores that we assigned for the levels of these variables.


Criteria For Assessing Goodness Of Fit
Criterion	DF	Value	Value/DF
Deviance	5	3.6075	0.7215
Scaled Deviance	5	3.6075	0.7215
Pearson Chi-Square	5	3.7879	0.7576
Scaled Pearson X2	5	3.7879	0.7576
Log Likelihood		6144.8702
Full Log Likelihood		-36.8426
AIC (smaller is better)		87.6851
AICC (smaller is better)		115.6851
BIC (smaller is better)		91.0795


Algorithm converged.


Analysis Of Maximum Likelihood Parameter Estimates
Parameter		DF	Estimate	Standard Error	Likelihood Ratio 95% Confidence Limits		Wald Chi-Square	Pr > ChiSq
Intercept		1	0.7240	0.6874	-0.6394	2.0571	1.11	0.2922
jobsecok	Notatall	1	-0.4067	0.3643	-1.1283	0.3012	1.25	0.2642
jobsecok	Nottoo	1	-0.5030	0.2636	-1.0205	0.0137	3.64	0.0564
jobsecok	Somewhat	1	0.1822	0.1421	-0.0943	0.4630	1.64	0.1997
jobsecok	Very	0	0.0000	0.0000	0.0000	0.0000	.	.
happy	Nottoo	1	1.9504	0.4096	1.1528	2.7597	22.67	<.0001
happy	Pretty	1	2.0866	0.2219	1.6577	2.5281	88.41	<.0001
happy	Very	0	0.0000	0.0000	0.0000	0.0000	.	.
x*y		1	0.4179	0.0586	0.3041	0.5339	50.85	<.0001
Scale		0	1.0000	0.0000	1.0000	1.0000

Note:The scale parameter was held fixed.

For our example (see ordinalLoglin.R and the resulting output), we create two new numerical variables as seen below:

    jobsecok = factor(rep(c("Not at all", "Not too", "Somewhat", "Very"), times=3))
    happy = factor(rep(c("Not too", "Pretty", "Very"), each=4))
    x = rep(c(1,2,3,4), times=3)
    y = rep(c(1,2,3), each=4)
    count = c(15,21,64,73,25,47,248,474,5,21,100,311)
    
    # setting references to last category (same as SAS)
    jobsecok = relevel(jobsecok, ref='Very')
    happy = relevel(happy, ref='Very')
    tbl.all = data.frame(jobsecok, x, happy, y, count)
    
    # independence Model
    model.ind = glm(count ~ jobsecok+happy, family=poisson(link=log), data=tbl.all)
    summary(model.ind)
    
    # linear by linear association model
    model.lin = glm(count ~ jobsecok+happy+x:y, family=poisson(link=log), data=tbl.all)
    summary(model.lin)

We observe \(G^2=3.6075\) with five degrees of freedom, which indicates that the linear by linear association model fits well and significantly better than the independence model (\(G^2=57.2404\) with six degrees of freedom). Note that these tests are relative to the saturated model, which treats both variables as entirely nominal and allows for any type of association between them, not just linear. In the case of these variables, which have four and three categories, respectively, there would be \((4-1)(3-1)=6)\) parameters needed to describe their nominal association. The linear-by-linear association model replaces them all with a single slope parameter, and the independence model removes them all entirely.

Next, since we're satisfied with the fit of the linear-by-linear association model, we would look at the test for the significance of the linear association. This would test for whether the slope parameter can be dropped and therefore has a single degree of freedom involved. We therefore can use either a likelihood test approach or the Wald test reported in the summary of parameter estimates; they should be very similar. Moreover, we expect these results to agree with those of the CMH test we performed earlier in Lesson 4. To see these results, we can find the LRT from the two deviance values:

\(G^2=57.2404-3.6075=53.633\)

The squared Wald statistic for \(\beta\) is \(z^2=7.131^2=50.85\), and finally, the CMH statistic is 47.97 from Lesson 4, all of which are relative to a chi-square distribution with one degree of freedom and demonstrate significant evidence of a linear association between job security and happiness---specifically, the numeric scores that we assigned for the levels of these variables.

    > summary(model.lin)
    ...
    Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
    (Intercept)         0.72400    0.68739   1.053   0.2922    
    jobsecokNot at all -0.40671    0.36428  -1.116   0.2642    
    jobsecokNot too    -0.50296    0.26364  -1.908   0.0564 .  
    jobsecokSomewhat    0.18222    0.14208   1.282   0.1997    
    happyNot too        1.95036    0.40959   4.762 1.92e-06 ***
    happyPretty         2.08661    0.22192   9.403  < 2e-16 ***
    x:y                 0.41785    0.05859   7.131 9.95e-13 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for poisson family taken to be 1)
    
        Null deviance: 1729.1636  on 11  degrees of freedom
    Residual deviance:    3.6075  on  5  degrees of freedom
    AIC: 87.685
    
    > lrtest(model.ind, model.lin)
    Likelihood ratio test
    
    Model 1: count ~ jobsecok + happy
    Model 2: count ~ jobsecok + happy + x:y
      #Df  LogLik Df  Chisq Pr(>Chisq)    
    1   6 -63.659                         
    2   7 -36.843  1 53.633  2.417e-13 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Choice of Scores Section

There are many options for assigning the score values, and these can be of equal or unequal spacing. The most common choice of scores are consecutive integers; that is \(u_1=1\), \(u_2=2\), etc. (which is what we used in the above example). The model with such scores is a special case of the linear-by-linear association model and is known as the Uniform Association Model. It is called the uniform association model because the odds ratios for any two adjacent rows and any two adjacent columns are the same and equal to

\(\theta=\exp(\beta(u_i-u_{i-1})(v_j-v_{j-1}))=\exp(\beta)\)

In other words, the Local Odds Ratio equals \(\exp(\beta)\) and is the same for adjacent rows and columns.

Also, sets of scores with the same spacing between them will lead to the same goodness-of-fit statistics, fitted counts, odds ratios, and \(\hat{\beta}\). For example,

\(v_1=1,v_2=2,v_3=3,v_4=4\) and \(v_1=8,v_2=9,v_3=10,v_4=11\)

will yield the same results. HOWEVER, note that although two sets of scores with the same relative spacing will lead to the same goodness-of-fit statistics, fitted counts, odds ratios, and estimates of \(\beta\), ultimately, the choice of scores should reflect the nature of the data and the context of the questions involved.