12.1 - Introduction to Generalized Estimating Equations

The idea behind GEEs is to produce reasonable estimates of model parameters, along with standard errors, without specifying a likelihood function in its entirety, which can be quite difficult with a multivariate categorical response. We have to account for the correlation among the multiple responses that arise from a single subject, but we can largely estimate these correlations from the data without having to accurately specify the form of the correlation structure.

In the General Social Survey (GSS) of 2018, respondents answered questions regarding their confidence in various institutions in the U.S., including education, medicine, and the scientific community. Among these three institutions, are there differences in the proportions of people having "a great deal" of confidence? Responses are clustered by subject because each subject provided answers to all three questions.

A GLM for Clustered Responses

The response vector for subject \(i\) is \(Y_i=(Y_{i1},\ldots,Y_{in_i})\), where \(Y_{ij}\) is the categorical response for subject \(i\) (\(1,\dots,n\)) and measurement \(j\) (\(1,\ldots,n_i\)). With expected value \(E(Y_{i})=\mu_i'=(\mu_{i1},\ldots,\mu_{in_i})\) and covariate/predictor vector \(x_{ij}\), we have the general model expression:

\(g(\mu_{ij})=x_{ij}'\beta\)

In our example above, \(Y_{ij}\) is binomial with mean \(\mu_{ij}=\pi_{ij}\), and the logit link would be used for \(g\). If the institution indicators, say \(Med_{ij}=1\) for medicine and \(Sci_{ij}=1\) for scientific community (each would be 0 otherwise), are added, along with a covariate for age, we have \(x_{ij}'=(1,Med_{ij},Sci_{ij},Age_i)\) and

\(\log\left(\dfrac{\pi_{ij}}{1-\pi_{ij}}\right)=\beta_0+Med_{ij}\beta_1+Sci_{ij}\beta_2+Age_i\beta_3\)

Note that the age of the subject does not vary within subject and so does not require a \(j\) subscript. Although subjects are assumed to be independently collected, the multiple responses within a subject are allowed to have a correlation structure. Here are some potential structures we may consider. In each matrix below (in general, these would be \(n_i\times n_i\)), the value in the \(j^{th}\) row and \(k^{th}\) column represents the correlation between the \(j^{th}\) and \(k^{th}\) measurement for the same subject.

Independence - (no correlation within subjects): \( \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\)

Exchangeable or Compound Symmetry: \( \begin{bmatrix} 1 & \rho & \rho \\ \rho & 1 & \rho \\ \rho & \rho & 1 \end{bmatrix}\)

Auto-Regressive Order 1 - (correlation depends on spatial "distance" between measurements): \( \begin{bmatrix} 1 & \rho & \rho^2 \\ \rho & 1 & \rho \\ \rho^2 & \rho & 1 \end{bmatrix}\)

Unstructured: \( \begin{bmatrix} 1 & \rho_{12} & \rho_{13} \\ \rho_{12} & 1 & \rho_{23} \\ \rho_{13} & \rho_{23} & 1 \end{bmatrix}\)

Maximum-likelihood estimation of these parameters from the would be identical to that for the familiar logistic model, except that the correlation structure would be incorporated into the likelihood for responses from the same subject. The GEE alternative avoids this.

Interpretation of Parameter Estimates:

The interpretation will depend on the chosen link function. For example, if it is logit, \(\exp\left(\beta_0\right) =\) the odds that the characteristic is present in an observation \(i\) when \(x_i = 0\), but if it is the identity, \(\exp\left(\beta_0\right) =\) is the value of the response when \(x_i = 0\).

Model Fit:

We don't test for the model fit of the GEE, because this is really an estimating procedure; there is no likelihood function.
We look at the empirical estimates of the standard errors and the covariance.
Compare the empirical estimates with the model-based estimates

Advantages:

Computationally more simple than MLE for categorical data
Does not require multivariate distribution
Consistent estimation even with mis-specified correlation structure

Limitations:

There is no likelihood function since the GEE does not specify completely the joint distribution; thus some do not consider it a model but just a method of estimation.
Likelihood-based methods are NOT available for testing fit, comparing models, and conducting inferences about parameters.
Empirical based standard errors underestimate the true ones, unless very large sample size.
Empirical estimators more variable than the parametric ones.

First we examine the method of "independence estimating equations," which incorrectly assumes that the observations within a subject are independent.

GEE Approach to Estimation

Starting with \(E(y_{i})=\mu_{i}\), the vector of means for subject \(i\) connected with the predictors via \(g(\mu_i)=x_i'\beta)\), we let \(\Delta_i\) be the diagonal matrix of variances

\(\Delta_i=\text{Diag}[\text{Var}(y_{ij})]= \begin{bmatrix}
Var_{i1} &\cdots & \cdots &\vdots \\
\vdots & Var_{i2} & \cdots & \vdots\\
\vdots& \cdots & \ddots & \vdots\\
\vdots& \cdots & \cdots & Var_{ij}
\end{bmatrix}\).

In terms of the correlation structure, this would be:

\( \begin{bmatrix} 1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{bmatrix}\)

Unless the observations within a subject are independent,

\(\Delta_i \neq \text{Cov}(y_i)\)

But if \(\Delta_i\) were correct, we could stack up the \(y_i\)'s and estimate \(\beta\) using techniques for generalized linear models.

How bad is it to pretend that \(\Delta_i\) is correct? Let \(\hat{\beta}\) be the estimate that assumes observations within a subject are independent (e.g., as found in ordinary linear regression, logistic regression, etc.)

If \(\Delta_i\) is correct, then \(\hat{\beta}\) is asymptotically unbiased and efficient. Recall that unbiased \(E(\hat{\beta})=\beta\), efficient means it has the smallest variance of all other possible estimations.
If Δ_i is not correct, then \(\hat{\beta}\) is still asymptotically unbiased but no longer efficient
\(\hat{\sigma}^2 \left[X^T \hat{W} X \right]^{-1}\)

may be very misleading (here, X is the matrix of stacked X_i's and \(\hat{W}\) is the diagonal matrix of final weights, if any)
- The 'naive' standard error for \(\hat{\beta}\), obtained from the naive estimate of \(\text{Cov}(\hat{\beta})\)
- consistent standard errors for \(\hat{\beta}\) are still possible using the sandwich estimator (sometimes called the 'robust' or 'empirical' estimator)

Sandwich Estimator

The sandwich estimator was first proposed by Huber (1967) and White(1980); Liang and Zeger (1986) applied it to longitudinal data

\(\left[X^T \hat{W} X \right]^{-1} \left[\sum\limits_i X_i^T (y_i-\hat{\mu}_i)(y_i-\hat{\mu}_i)^T X_i \right]\left[X^T \hat{W} X \right]^{-1}\)

provides a good estimate of \(\text{Cov}(\hat{\beta})\) in large samples (several hundred subjects or more) regardless of the true form of \(\text{Cov}\left(y_i\right)\)
in smaller samples it could be rather noisy, so 95% intervals obtained by ± 2 SE's could suffer from under-coverage

When within-subject correlations are not strong, Zeger (1988) suggests that the use of IEE with the sandwich estimator is highly efficient.

^[1]	Link
↥	Has Tooltip/Popover
	Toggleable Visibility