Generalized Linear Mixed Models

We have looked at the theory and practice of modeling longitudinal data using generalized estimating equations(GEE).GEE methods are "semiparametric" because they do not rely on a fully specified probability model. With GEE, the estimates are efficient if the working covariance assumptions are correct. If the working covariance assumptions are wrong, the estimated coefficients are still approximately unbiased, and SE’s from the sandwich (empirical)method are reasonable if the sample is large. The philosophy of GEE is to treat the covariance structure as a nuisance.

An alternative to GEE is the class of generalized linear mixed models(GLMM). These are fully parametric and model the within-subject covariance structure more explicitly.

GLMM is a further extension of GLMs that permits random effects as well as fixed effects in the linear predictor.

Fix Effect vs Random Effect

Fix effects are parameters that describe a factor’s effects. They apply to all categories of interest, e.g. genders, treatments. By contrast, random effects apply to a sample.

For a study with repeated measurements of subjects, for example a cluster is a set of observations for a particular subject, and the model contains a random effect term for each subject.

Recall that in OLS:

Fixed effect

\(Y = \alpha + \beta X\)

Random effect

\(Y_i = \alpha_i + \beta X_i\)

Random Intercept Model

This is the most common random effect model. Let \(u_i\) denote the random effect for cluster i, \(y_{it}\) denote observation t in cluster i, \(x_{it}\) be the value of the explanatory variable for that observation.

With link function g(·), the GLMM has the form"

\(g(\mu_{it} )= u_i + \beta x_{it}, \;\;\;\;\;\; i =1, ... ,n,t =1,...,T\)

The random effect \(u_i\) is a random variable, \(u_i \sim N(\alpha,\sigma^2)\), where \(\alpha\) and \(\sigma^2\) are unknown. \(\sigma^2\) is referred to as a variance component.
Each cluster has its own effect.
This model implies a non-negative correlation between observations within a cluster.
Conditional on \(u_i\), this is a GLM.

Why not use a fixed effect for each cluster? If a study has a large number of clusters, a fixed effect will have a lot of parameters. With \(u_i\) as random effects, it only add a single parameter \(\sigma\).

Using this model, we can get the mean of particular clusters. \(\sigma^2\) reflects the variability among clusters. Sometimes the variability might be caused by not including certain explanatory variables that are associated with the response.

Logistic-Normal Models

Suppose observation t in cluster i has binary response \(y_{it}\), consider the logistic regression model with random effect \(u_i\)

\(logit(P(Y_{it} = 1)) = u_i + x_{it}\)

where \(u_i \sim N(\alpha,\sigma^2)\). This is a special case of the GLMM above with logit link function. It is called a logistic-normal model, since the random effect \(u_i\) has a normal distribution.

Clusters with a relatively large positive \(u_i\) have a relatively large \(P(Y_{it} =1)\) for each t.

If \(x_{it}=0\) for t = 1 and\(x_{it}=1\) for t =2, this is the case of matched pairs. When is small relative to \(u_i\), we expect to see a strong agreement among clusters.

What will \(u_i\) be if there is a high proportion of (\(y_{i1}=0\), \(y_{i2}=0\))? What about \(\sigma\)?

Example: Sacrifices for the Environment

Below is a 2 × 2 table from the General Social Survey. Subjects were asked whether, to help the environment, they were willing to (1) raise taxes (2) accept a cut in living standards.

	Cut living standards
Raise tax	Yes	No	Total
Yes	227	132	359
No	107	678	785
Total	334	810	1144

We will fit the following GLMM model:

\(logit[P(Y_{i1} =1)] = u_i + \alpha +\beta , \)

\(logit[P(Y_{i2} =1)] = u_i + \alpha \)

where \(Y_{i1}\) is the response about higher taxes for subject i, \(Y_{i2}\) is the response about lower standard of living for subject i, \(u_i \sim N(0,\sigma^2)\). This is a reparameterization of the model above to separate the mean and variance of the random effect.

The ML fit of this model yields \(\beta =0.210 (SE = 0.130)\) with \(\alpha =2.85\). For a given subject, the estimated odds of a "yes" response on accepting higher taxes equal exp(0.210) =1.23 times the odds of a "yes" response on accepting a lower standard of living.

Conditional Models and Marginal Models

Conditional models: The parameters in GLMMs have conditional(cluster specific) interpretations given random effect. They are equivalent to studying the association in partial tables conditional on each cluster.

Marginal models: The effects in marginal models are averaged over all clusters(i.e. population-averaged). They apply to collapsed tables, summarized over the subjects. GEE is a marginal model.

Example: Sacrifices for the Environment, cont’d

Marginal models:

\(logit[P(Y_1=1)] = \alpha +\beta\)

\(logit[P(Y_2=1)] = \alpha \)

where \(Y_{i1}\) is the response about higher taxes for a randomly selected subject, \(Y_{i2}\) is the response about lower standard of living for a randomly selected subject.

The log odds ratio,

\(\hat{\beta}=log[(359 \times 810)/(785 \times 334)] =0.104\).

This is smaller than the estimated effect(\(\hat{\beta}=0.210\)) for the conditional model.

Compare the estimates from conditional models and marginal models:

When the link function is nonlinear, such as the logit, the population-averaged effects of marginal models usually are smaller than cluster-specific parameters.
The larger \(\sigma\) is, the larger the difference between the effects of the two models is.
Usually, the two approaches provide similar substantive interpretations and conclusions.

Pros and Cons of the two types of models

With marginal models, ML is sometimes possible but the GEE approach is computationally simpler. Conditional models provide full likelihood and usually are more computationally intensive.
Conditional models provide more information, such as cluster-specific effects, which is not available in marginal models.
Conditional models are joint models and require more model assumptions.
Conditional models are preferred, when one wants to estimate cluster specific effects(within subject effect) or their variability, or the data generating procedure.
Marginal models may be adequate if the main interest is to estimate "between cluster" effects.

Applications of Random Effects Models

Small-area Estimation of Binomial Probabilities

Small-area estimation refers to estimation of parameters for many areas when each may have relatively few observations.

The fixed effect model

\(logit(\pi_i)= \beta_i, \;\;\;\;\; i =1,...,n\)

treats the areas as levels of a single factor.

Random effects models treat each area as a cluster

\(logit(\pi_i)= u_i + \alpha \)

where \(u_i \sim N(0,\sigma^2). The fixed effect model is a saturated model with n parameters. The ML estimates of \(\pi_i\) are just sample proportion, \(p_i = y_i /T_i\), where \(y_i\) is the observed successes in area i and \(T_i\) is the total counts in area i. For small \(T_i\), sample proportions may poorly estimate \(\pi_i\).

Random effects model has two parameters, instead of n parameters. It can provide improved estimates by borrow strength from the whole, i.e. using all areas instead of just any given one.

QUESTION: What does it mean when \(\sigma= 0\)?

Example: Basketball Free Throw

Player	\(n_i\)	\(p_i\)	\(\hat{\pi}_i\)
Yao	13	0.769	0.730
Fryw	10	0.900	0.761
Curry	11	0.545	0.663
	(\cdots\)
Ostertag	6	0.167	0.608
Brown	4	1.000	0.748

How does \(p_i\) differ from \(\pi_i\)?

Modeling Repeated Responses

Example: Support (1 = yes, 2 = no) for legalizing abortion in 3 situations by gender. Three situations:(1) family can’t afford, (2) woman does not want to marry the man, (3) any reason.

Gender	sequence of responses
Gender	(1,1,1)	(1,1,2)	(2,1,1)	(\cdots\)	(2,2,2)
Male	342	26	6	(\cdots\)	356
Female	440	25	14	(\cdots\)	457

Random effect model

\(logit[P(Y_{it}=1)] = u_i+ \beta_t+\gamma_{xi} \)

where\(x_i =1\) for females and 0 for males and \(u_i \sim N(0,\sigma^2),

How to interpret \(\gamma\)?
How to interpret \(\beta\)?

Marginal model

\(logit[P(Y_t=1)] = \beta_t+\gamma_{x} \)

Random Effect Models for Multinomial Responses

GLMMs extend directly from binary outcomes to multiple-category outcomes. When responses are ordinal, it is often adequate to use the same random effect term for each logit. With cumulative logits, this is the proportional odds structure for fixed effects. For GLMMs, it’s possible to have both random intercepts and random slopes.

Example: Below is the result of a randomized double-blind clinical trail comparing a drug with placebo in treating insomnia patients. The response is thepatient’s reported time(in minutes) to fall asleep after going to bed. Patients responded before and following a 2 week treatment period.

Treatment	Time to falling asleep
	Initial	Follow-up
	Initial	< 20	20-30	30-60	> 60
Active	< 20	7	4	1	0
	20-30	11	5	2	2
	30-60	13	23	3	1
	> 60	9	17	13	8
Placebo	< 20	7	4	2	1
	20-30	14	5	1	0
	30-60	6	9	18	2
	> 60	4	11	14	22

Let (y_t\) = time to fall asleep at occasion t (0=initial, 1=follow-up), and let x =treatment (1=active, 0=placebo). Let (y_{it}\) denote the response for subject i at occasion t.

The Marginal Model (GEE)

\(logit[P(Y_t \ge j)]= \alpha_j+ \beta_1x + \beta_2t + \beta_3(t \times x) \)

The random-intercept model

\(logit[P(Y_{it} \ge j)]= \u_i+\alpha_j+ \beta_1x + \beta_2t + \beta_3(t \times x) \)

The random effect is assumed to be the same for each cumulative probability.

	GEE	GLMM
\(hat{\beta}_1\)	0.034(0.238)	0.058(0.366)
\(hat{\beta}_2\)	1.038(0.168)	1.602(0.283)
\(hat{\beta}_4\)	0.708(0.244)	1.081(0.380)

At the initial observation, the odds ratio between placebo and treatment groups (exp(\(\beta_1\))) is small (GEE: 1.035;GLMM: 1.060). This reflects the results of randomization.
\(\hat{\beta}_3\) and SE indicate considerable evidence of interaction in both GEE and GLMM.
The estimates and standard errors are about 50% larger for GLMM than for the marginal model. This reflects the relatively large heterogeneity. The random effects have estimated standard deviation \(\hat{\sigma}=1.90\). This shows a strong association between the responses at the two occasions.

The Transitional Model:

Another way is to model the follow-up response (\(Y_2\)), conditional on \(y_1\) rather than marginal distributions of (\(Y_1, Y_2\)). In this model, the initial response \(y_1\) plays the role of an explanatory variable.

\(logit[P(Y_2 \ge j)]= \alpha_j+ \beta_1x + \beta_2y_1 \)

It can be fit using software for cummulative logit models. \(\hat{\beta}_1\) = 0.885 with SE = 0.246. For any given value for the initial response, the estimated odds of falling asleep by a particular time for the active treatment are exp(0.885) =2.4 times those for the placebo group.

Bivariate Random Effects and Association Heterogeneity

In addition to random intercepts, sometimes it’s sensible to have both intercept and slope to be random.

Example: This dataset consists of 41 studies that compared a new surgery and an old surgery. The response was adverse event (Yes = 1, No = 0). Below lists three studies.

Study	Treatment	adverse event		sample odds ratio	fitted odds ratio
Study	Treatment	Yes	No	sample odds ratio	fitted odds ratio
1	new	7	8	0.159	0.147
1	old	11	2
5	new	3	9	inf	2.59
5	old	0	12
6	new	4	3	0.0	0.126
6	old	4	0

In addition to the difference among studies, the surgery effect may also vary in different studies.

So we consider a logistic-normal model permitting treatment-by-study interaction:

\(logit[P(Y_{i1}=1)]= u_i+\alpha+ (\beta + v_i) \)

\(logit[P(Y_{i2}=1)]= u_i+\alpha \)

where \(u_i\) is a random intercept and \(v_i\) is a random slope. We assume(\(u_i\),\(v_i\)) have a bivariate normal distribution with mean 0, variance \(\sigma_u^2\) and \(\sigma_v^2\) for \(u_i\) and \(v_i\), respectively, and a correlation \(\rho\) between ui and \(v_i\), i =1, ... , 41.

Log odds ratio between treatment and response equals \(\beta + v_i\) in study i.
\(\beta\) is the mean study-specific log odds ratio and \(\sigma_v\) describes variability in the log odds ratios across studies.
The larger \(\sigma_v\) is, the larger the SE of \(\beta\) is. This is because the higher the variation is across studies, the more difficult to estimate \(\beta\) accurately.
This approach is very useful when the sample size in each stratum is small, since it smoothes out the variation in the sample log odds ratio.