11.1 - Modeling Ordinal Data with Log-linear Models

Earlier in Lesson 4, we had described the ways to perform significance tests for independence and conditional independence, and to measure (linear) associations with ordinal categorical variables. For example, we focused on the CMH statistic and correlation measures for testing independence and linear associations for the example comparing job security and happiness from the general social survey data. We concluded that those who generally agree more with the statement "job security is good" have a greater odds of being happier.

  Not too happy Pretty happy Very happy
Not at all true 15 25 5
Not too true 21 47 21
Somewhat true 64 248 100
Very true 73 474 311

 Can we answer the same questions (and more) via log-linear models?

Modeling Ordinal Data in 2-way Tables Section

Loglinear models for contingency tables, by default, treat all variables as nominal variables. If there is an ordering of the categories of the variables, this is not taken into account. That means that we could rearrange the rows and/or columns of a table and we would get the same fitted odds ratios for the data as we would be given the original ordering of the rows and/or columns.

To model ordinal data with log-linear models, we can apply some of the general ideas we saw in the analysis of ordinal data earlier in the course. That is, we typically

  • assign scores to the levels of our categorical variables, and
  • include additional parameters (which represent these scores) into a log-linear model to model the dependency between two variables.

Linear by Linear Association Model Section

This is the most common log-linear model when we have ordinal data.

Objective

Modeling the log counts by accounting for the ordering of the categories of discrete variables.

Suppose we assign numeric scores for the categories of the row variable, \(u_1\le u_2\le \cdots\le u_I\), and for the categories of the column variable, \(v_1\le \cdots\le v_J\). Then, we can model the association between the two variables.

Model Structure

\(\log(\mu_{ij})=\lambda+\lambda_i^C+\lambda_j^S+\beta u_i v_j\)

For each row \(i\), the log fitted values are a linear function of the columns. For each column \(j\), the log fitted values are a linear function of the rows.

Parameter estimates and interpretation

This model only has one more parameter than the independence model (i.e., \(\beta\)), and is in between the independence and the saturated models in terms of its complexity. We are trying to say something about the "linear association" between these two variables based on the scores that we have assigned.

  • If \(\beta > 0\), then the two variables are positively associated (i.e., happiness tends to go up as job security goes up).
  • If \(\beta<0\), then the two variables are negatively associated.
  • The odds ratio for any \(2 \times 2\) sub-table is a direct function of the row and column scores and \(\beta\):

\begin{align}
\theta(ij,i'j') &= \log(\mu_{ij})+\log(\mu_{i'j'})-\log(\mu_{i'j})-\log(\mu_{ij'})\\
&= \beta(u_i-u_{i'})(v_j-v_{j'})\\
\end{align}

Model Fit

We can use the deviance likelihood ratio and deviance statistics \(G^2\) as with any other log-linear model.

For our example (see ordinalLoglin.sas and the resulting output), we create two new numerical variables as seen below in the datalines statement:

    data gss;
    input jobsecok $ x happy $ y count;
    datalines;
      Notatall 1  Nottoo 1    15
        Nottoo 2  Nottoo 1    21
      Somewhat 3  Nottoo 1    64
          Very 4  Nottoo 1    73
      Notatall 1  Pretty 2    25
        Nottoo 2  Pretty 2    47
      Somewhat 3  Pretty 2   248
          Very 4  Pretty 2   474
      Notatall 1    Very 3     5
        Nottoo 2    Very 3    21
      Somewhat 3    Very 3   100
          Very 4    Very 3   311
    ;
    proc genmod data=gss order=data;
    class jobsecok happy;
    model count=jobsecok happy/link=log dist=poisson lrci type3 obstats;
    title 'Indepedence Model';
    run;
    proc genmod data=gss order=data;
    class jobsecok happy;
    model count=jobsecok happy x*y/link=log dist=poisson lrci type3 obstats;
    title 'Linear by Linear Association Model';
    run;

We observe \(G^2=3.6075\) with five degrees of freedom, which indicates that the linear by linear association model fits well and significantly better than the independence model (\(G^2=57.2404\) with six degrees of freedom). Note that these tests are relative to the saturated model, which treats both variables as entirely nominal and allows for any type of association between them, not just linear. In the case of these variables, which have four and three categories, respectively, there would be \((4-1)(3-1)=6)\) parameters needed to describe their nominal association. The linear-by-linear association model replaces them all with a single slope parameter, and the independence model removes them all entirely.

Next, since we're satisfied with the fit of the linear-by-linear association model, we would look at the test for the significance of the linear association. This would test for whether the slope parameter can be dropped and therefore has a single degree of freedom involved. We therefore can use either a likelihood test approach or the Wald test reported in the summary of parameter estimates; they should be very similar. Moreover, we expect these results to agree with those of the CMH test we performed earlier in Lesson 4. To see these results, we can find the LRT from the two deviance values:

\(G^2=57.2404-3.6075=53.633\)

The squared Wald statistic for \(\beta\) is \(z^2=7.131^2=50.85\), and finally, the CMH statistic is 47.97 from Lesson 4, all of which are relative to a chi-square distribution with one degree of freedom and demonstrate significant evidence of a linear association between job security and happiness---specifically, the numeric scores that we assigned for the levels of these variables.

 
Criteria For Assessing Goodness Of Fit
Criterion DF Value Value/DF
Deviance 5 3.6075 0.7215
Scaled Deviance 5 3.6075 0.7215
Pearson Chi-Square 5 3.7879 0.7576
Scaled Pearson X2 5 3.7879 0.7576
Log Likelihood   6144.8702  
Full Log Likelihood   -36.8426  
AIC (smaller is better)   87.6851  
AICC (smaller is better)   115.6851  
BIC (smaller is better)   91.0795  
 
Algorithm converged.
 
Analysis Of Maximum Likelihood Parameter Estimates
Parameter   DF Estimate Standard
Error
Likelihood Ratio 95% Confidence Limits Wald Chi-Square Pr > ChiSq
Intercept   1 0.7240 0.6874 -0.6394 2.0571 1.11 0.2922
jobsecok Notatall 1 -0.4067 0.3643 -1.1283 0.3012 1.25 0.2642
jobsecok Nottoo 1 -0.5030 0.2636 -1.0205 0.0137 3.64 0.0564
jobsecok Somewhat 1 0.1822 0.1421 -0.0943 0.4630 1.64 0.1997
jobsecok Very 0 0.0000 0.0000 0.0000 0.0000 . .
happy Nottoo 1 1.9504 0.4096 1.1528 2.7597 22.67 <.0001
happy Pretty 1 2.0866 0.2219 1.6577 2.5281 88.41 <.0001
happy Very 0 0.0000 0.0000 0.0000 0.0000 . .
x*y   1 0.4179 0.0586 0.3041 0.5339 50.85 <.0001
Scale   0 1.0000 0.0000 1.0000 1.0000    

Note:The scale parameter was held fixed.

For our example (see ordinalLoglin.R and the resulting output), we create two new numerical variables as seen below:

    jobsecok = factor(rep(c("Not at all", "Not too", "Somewhat", "Very"), times=3))
    happy = factor(rep(c("Not too", "Pretty", "Very"), each=4))
    x = rep(c(1,2,3,4), times=3)
    y = rep(c(1,2,3), each=4)
    count = c(15,21,64,73,25,47,248,474,5,21,100,311)
    
    # setting references to last category (same as SAS)
    jobsecok = relevel(jobsecok, ref='Very')
    happy = relevel(happy, ref='Very')
    tbl.all = data.frame(jobsecok, x, happy, y, count)
    
    # independence Model
    model.ind = glm(count ~ jobsecok+happy, family=poisson(link=log), data=tbl.all)
    summary(model.ind)
    
    # linear by linear association model
    model.lin = glm(count ~ jobsecok+happy+x:y, family=poisson(link=log), data=tbl.all)
    summary(model.lin)
    
We observe \(G^2=3.6075\) with five degrees of freedom, which indicates that the linear by linear association model fits well and significantly better than the independence model (\(G^2=57.2404\) with six degrees of freedom). Note that these tests are relative to the saturated model, which treats both variables as entirely nominal and allows for any type of association between them, not just linear. In the case of these variables, which have four and three categories, respectively, there would be \((4-1)(3-1)=6)\) parameters needed to describe their nominal association. The linear-by-linear association model replaces them all with a single slope parameter, and the independence model removes them all entirely.

 

Next, since we're satisfied with the fit of the linear-by-linear association model, we would look at the test for the significance of the linear association. This would test for whether the slope parameter can be dropped and therefore has a single degree of freedom involved. We therefore can use either a likelihood test approach or the Wald test reported in the summary of parameter estimates; they should be very similar. Moreover, we expect these results to agree with those of the CMH test we performed earlier in Lesson 4. To see these results, we can find the LRT from the two deviance values:

\(G^2=57.2404-3.6075=53.633\)

The squared Wald statistic for \(\beta\) is \(z^2=7.131^2=50.85\), and finally, the CMH statistic is 47.97 from Lesson 4, all of which are relative to a chi-square distribution with one degree of freedom and demonstrate significant evidence of a linear association between job security and happiness---specifically, the numeric scores that we assigned for the levels of these variables.

    > summary(model.lin)
    ...
    Coefficients:
                       Estimate Std. Error z value Pr(>|z|)    
    (Intercept)         0.72400    0.68739   1.053   0.2922    
    jobsecokNot at all -0.40671    0.36428  -1.116   0.2642    
    jobsecokNot too    -0.50296    0.26364  -1.908   0.0564 .  
    jobsecokSomewhat    0.18222    0.14208   1.282   0.1997    
    happyNot too        1.95036    0.40959   4.762 1.92e-06 ***
    happyPretty         2.08661    0.22192   9.403  < 2e-16 ***
    x:y                 0.41785    0.05859   7.131 9.95e-13 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    
    (Dispersion parameter for poisson family taken to be 1)
    
        Null deviance: 1729.1636  on 11  degrees of freedom
    Residual deviance:    3.6075  on  5  degrees of freedom
    AIC: 87.685
    
    > lrtest(model.ind, model.lin)
    Likelihood ratio test
    
    Model 1: count ~ jobsecok + happy
    Model 2: count ~ jobsecok + happy + x:y
      #Df  LogLik Df  Chisq Pr(>Chisq)    
    1   6 -63.659                         
    2   7 -36.843  1 53.633  2.417e-13 ***
    ---
    Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
    

Choice of Scores Section

There are many options for assigning the score values, and these can be of equal or unequal spacing. The most common choice of scores are consecutive integers; that is \(u_1=1\), \(u_2=2\), etc. (which is what we used in the above example). The model with such scores is a special case of the linear-by-linear association model and is known as the Uniform Association Model. It is called the uniform association model because the odds ratios for any two adjacent rows and any two adjacent columns are the same and equal to

\(\theta=\exp(\beta(u_i-u_{i-1})(v_j-v_{j-1}))=\exp(\beta)\)

In other words, the Local Odds Ratio equals \(\exp(\beta)\) and is the same for adjacent rows and columns.

Also, sets of scores with the same spacing between them will lead to the same goodness-of-fit statistics, fitted counts, odds ratios, and \(\hat{\beta}\). For example,

\(v_1=1,v_2=2,v_3=3,v_4=4\) and \(v_1=8,v_2=9,v_3=10,v_4=11\)

will yield the same results. HOWEVER, note that although two sets of scores with the same relative spacing will lead to the same goodness-of-fit statistics, fitted counts, odds ratios, and estimates of \(\beta\), ultimately, the choice of scores should reflect the nature of the data and the context of the questions involved.