12.1 - Introduction to Generalized Estimating Equations

Printer-friendly versionPrinter-friendly version

In Lesson 4 we introduced an idea of dependent samples, i.e., repeated measures on two variables or two points in time, matched data and square tables. We described the ways to perform significance tests for models of marginal homogeneity, symmetry, and agreement. In Lessons 10 and 11, we learned how to answer the same questions (and more) via log-linear models.

In this lesson we will introduce models for repeated categorical response data, and thus generalize models for matched pairs.

We have learned so far to model the count data as various generalized linear models with a key assumption of independence among the response. GEE approach is an extension of GLMs. It provides a semi-parametric approach to longitudinal analysis of categorical response; it can be also used for continuous measurements.

Generalized Estimating Equations (GEE)

We will focus only on basic ideas of GEE; for more details see cited references at the beginning of the lecture. GEE's were first introduced by Liang and Zeger (1986); see also Diggle, Liang and Zeger, (1994). The very crux of GEE is instead of attempting to model the within-subject covariance structure, to treat it as a nuisance and simply model the mean response.

In this framework, the covariance structure doesn't need to be specified correctly for us to get reasonable estimates of regression coefficients and standard errors.


Fit a model to repeated categorical responses, that is correlated and clustered responses, by GEE methodology.


  • A response variable Y can be either continuous or categorical. We will focus on categorical Y = (Yij) response for each subject i, measured at different occasions (e.g., time points), j = 1, 2, ... , ni). Each yi can be, for example, a binomial or multinomial response.
  • X = (X1, X2, ... , Xk) be a set of explanatory variables which can be discrete, continuous, or a combination. Xi is ni × k matrix of covariates.


Its form is like GLM, but full specification of the joint distribution not required, and thus no likelihood function:

\(g(\mu_i)=x_i^T \beta\)

Random component:

Any distribution of the response that we can use for GLM, e.g., binomial, multinomial, normal, etc.

Systematic component:

A linear predictor of any combination of continuous and discrete variables.

Link function:

Can be any gi), e.g., identity, log, logit, etc.

Covariance structure:

Correlated data are modeled using the same link function and linear predictor setup (systematic component) as in the case of independent responses. The random component is described by the same variance functions as in the independence case, but the covariance structure of the correlated responses must also be specified and modeled now!


  • The responses are Y1 ,Y2 , ... , Yn are correlated or clustered, i.e., cases are NOT independent.
  • Covariates can be the power terms or some other nonlinear transformations of the original independent variables, can have interaction terms.
  • The homogeneity of variance does NOT need to be satisfied
  • Errors are correlated
  • It uses quasi-likelhood estimation rather than maximum likelihood estimation (MLE) or ordinary least squares(OLS) to estimate the parameters, but at times these will coincide.
  • Covariance specification.  These are typically four or more correlation structures that we assume apriori.  Here are four correlation structures:

Independence - (correlation between time points is independent)

\[ \begin{bmatrix} 1 & 0 & 0  \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\]

Exchangable (or Compund Symmetry)

\[ \begin{bmatrix} 1 & \rho & \rho  \\ \rho & 1 & \rho \\ \rho & \rho & 1 \end{bmatrix}\]

AutoRegressive Order 1 (AR 1)

\[ \begin{bmatrix} 1 & \rho & \rho^2  \\ \rho & 1 & \rho \\ \rho^2 & \rho & 1 \end{bmatrix}\]


\[ \begin{bmatrix} 1 & \rho_{12} & \rho_{13}  \\ \rho_{12} & 1 & \rho_{23} \\ \rho_{13} & \rho_{23} & 1 \end{bmatrix}\]

\(\rho_{ij} =corr(Y_{ij}, Y_{ik})\) for the ith subject at times j and k.

Parameter Estimation:

The quasi-likelihood estimators are estimates of quasi-likelihood equations which are called generalized estimating equations. A quasi-likelihood estimate of β arise from maximization of normality-based loglikelihood without assuming that the response is normally distributed. Recall, that we briefly discussed quasi-likelihood when we introduced overdispersion in Lesson 6.

In general, there are no closed-form solutions, so the GEE estimates are obtained by using an iterative algorithm, that is iterative quasi-scoring procedure.

GEE estimates of model parameters are valid even if the covariance is mis-specified (because they depend on the first moment, e.g., mean). However, if the correlation structure is mis-specified, the standard errors are not good, and some adjustments based on the data(empirical adjustment) are needed to get more appropriate standard errors. Agresti (2013) points out that a chosen model in practice is never exactly correct, but choosing carefully a working correlation(covariance structure) can help with efficiency of the estimates.

For more technical details see Agresti (2013) 4.7, and 11.3, and Agresti (2007), Ch. 9. Also, the end of this lecture has a bit more technical addendum based on Dr. Schafer's notes.

Interpretation of Parameter Estimates:

The interpretation will depend on the chosen link function. For example if it is logit, exp(β0 ) = the odds that the characteristic is present in an observation i when Xi = 0, but if it is identity, exp(β0 ) = is the value of the response when Xi = 0.


Wald statistics based confidence intervals and hypothesis testing for parameters; recall they rely on asymptotic normality of estimator and their estimated covariance matrix.

Model Fit:

  • We don't test for the model fit of the GEE, because this is really an estimating procedure; there is no likelihood function!
  • We look at the empirical estimates of the standard errors and the covariance.
  • Compare the empirical estimates with the model-based estimates
  • For model based output, we can still use overall goodness-of-fit statistics: Pearson chi-square statistic, X2 , Deviance, G2 , Likelihood ratio test, and statistic, ΔG2 , Hosmer-Lemeshow test and statistic, and Residual analysis: Pearson, deviance, adjusted residuals, etc...
  • Overdispersion


  • Computationally more simple than MLE for categorical data
  • Does not require multivariate distribution
  • Consistent estimation even with mis-specified correlation structure


  • There is no likelihood function since the GEE does not specify completely the joint distribution; thus some do not consider it a model but just a method of estimation.
  • Likelihood-based methods are NOT available for testing fit, comparing models, and conducting inferences about parameters.
  • Empirical based standard errors underestimate the true ones, unless very large sample size.
  • Empirical estimators more variable than the parametric ones.

First we examine the method of "independence estimating equations," which incorrectly assumes that the observations within a subject are independent.

Independence Estimating Equations (IEE)

The data for a single subject i measured at occasions j = 1, 2, ... , ni:

yi = (yi1, yi2 , ... , yi , ni )T

yij = discrete or continuous response

Xi = ni × p matrix of covariates

Let us suppose that the mean response


is related to the covariates by a link function,

\(g(\mu_i)=X_i \beta\)

and let Δi be the diagonal matrix of variances

\(\Delta_i=\text{Diag}[\text{Var}(y_{ij})]= \begin{bmatrix}
Var_{i1} &\cdots  & \cdots &\vdots \\
 \vdots & Var_{i2} & \cdots & \vdots\\
 \vdots& \cdots & \ddots  & \vdots\\
 \vdots& \cdots & \cdots & Var_{ij}

In terms of the correlation structure, this would be:

\[ \begin{bmatrix} 1 & 0 & 0  \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\]

Unless the observations within a subject are independent,

\(\Delta_i \neq \text{Cov}(y_i)\)

But if Δi were correct, we could stack up the yi's and estimate β using techniques for generalized linear models.

How bad is it to pretend that Δi is correct?   Let \(\hat{\beta}\) be the estimate that assumes observations within a subject are independent (e.g., as found in ordinary linear regression, logistic regression, etc.)

  • If Δi is correct, then \(\hat{\beta}\) is asymptotically unbiased and efficient.  Recall that unbiased \(E(\hat{\beta})=\beta\), efficient means it has the smallest variance of all other possible estimations.
  • If Δi is not correct, then \(\hat{\beta}\) is still asymptotically unbiased but no longer efficient
    • The 'naive' standard error for \(\hat{\beta}\), obtained from the naive estimate of \(\text{Cov}(\hat{\beta})\)
    • \(\hat{\sigma}^2 \left[X^T \hat{W} X \right]^{-1}\)

      may be very misleading (here, X is the matrix of stacked Xi's and \(\hat{W}\) is the diagonal matrix of final weights, if any)

    • consistent standard errors for \(\hat{\beta}\) are still possible using the sandwich estimator (sometimes called the 'robust' or 'empirical' estimator)

Sandwich Estimator

The sandwich estimator was first proposed by Huber (1967) and White(1980); Liang and Zeger (1986) applied it to longitudinal data

\(\left[X^T \hat{W} X \right]^{-1} \left[\sum\limits_i X_i^T (y_i-\hat{\mu}_i)(y_i-\hat{\mu}_i)^T X_i \right]\left[X^T \hat{W} X \right]^{-1}\)

  • provides a good estimate of \(\text{Cov}(\hat{\beta})\) in large samples (several hundred subjects or more) regardless of the true form of Cov(yi)
  • in smaller samples it could be rather noisy, so 95% intervals obtained by ± 2 SE's could suffer from under-coverage

When within-subject correlations are not strong, Zeger (1988) suggests that the use of IEE with the sandwich estimator is highly efficient.