6.2.1  Fitting the Model in SAS
6.2.1  Fitting the Model in SASThere are different ways to do this depending on the format of the data. As before, for details, you need to refer to SAS or R help. Here are some general guidelines to keep in mind. Please note that we make a distinction about the way the data are entered into SAS (or R).
If data come in a tabular form, i.e., the response pattern is with counts (as seen in the previous example).
model y/n = x1 x2 /link=logit dist=binomial;
model y/n = x1 x2;[enter code here]
 PROC GENMOD: We need a variable that specifies the number of cases that equals marginal frequency counts or number of trials (e.g. n), and the number of events (y)
 PROC LOGISTIC: We do need a variable that specifies the number of cases that equals marginal frequency counts
If data come in a matrix form, i.e., subject \(\times\) variables matrix with one line for each subject, like a database
model y/n = x1 x2 / link = logit dist = binomial;
model y = x1 x2;
 PROC GENMOD: We need a variable that specifies the number of cases that equals marginal frequency counts or number of trials (e.g. n), and the number of events (y)
 PROC LOGISTIC: We do NOT need a variable that specifies the number of cases that equals marginal frequency counts
For both GENMOD and LOGISTIC, as before, include interaction terms with *, and make sure to include all lowerorder terms.
We will follow both the SAS output through to explain the different parts of model fitting. The outputs from R will be essentially the same.
Example 61: Student Smoking
Let's begin with collapsed \(2\times2\) table:
Student smokes  Student does not smoke  

1–2 parents smoke  816  3203 
Neither parent smokes  188  1168 
Let's look at one part of smoke.sas:
data smoke;
input s $ y n ;
cards;
smoke 816 4019
nosmoke 188 1356
;
proc logistic data=smoke descending;
class s (ref=first) / param=ref;
model y/n = s /scale=none;
run;
In the data step, the dollar sign \$ as before indicates that S is a characterstring variable.
In the logistic step, the statement: descending
insures that you are modeling a probability of an "event" which takes value 1, otherwise by default SAS models the probability of "nonevent."
class S (ref=first) / param=ref;
This code says that S should be coded as a categorical variable using the first category as the reference or zero group. (The first category is "nosmoke," because it comes before "smoke" in alphabetical order). You can vary this order by additional options provided by class and/or by entering data in a different order. See SAS online help for details, and the rest of smoke.sas for more options.
model y/n = s /scale = none
Because we have grouped data (i.e. multiple trials per line of the data set), the model statement uses the "event/trial" syntax, in which y/n appears on the lefthand side of the equal sign. The predictors go on the righthand side, separated by spaces if there are more than one. An intercept is added automatically by default.
scale=none
This option serves two purposes. One, it will give us the overall goodnessoffit test statistics, deviance \(G^2\) and Person \(X^2\). It also enables us to specify a value for a dispersion parameter in order to correct for overor underdispersion. In this case, we are NOT controlling for either. Overdispersion is an important concept with discrete data. In the context of logistic regression, overdispersion occurs when there are discrepancies between the observed responses \(y_i\) and their predicted values \(\hat{\mu}_i = n_{i}\hat{\pi}_i\) and these values are larger than what the binomial model would predict.
If \(y_i \sim Binomial(n_i, \pi_i)\), the mean is \(\mu_i = n_i \pi_i\) and the variance is \(\dfrac{\mu_i(n_i − \mu_i)}{n_i}\). Overdispersion means that the data show evidence that the variance of the response \(y_i\) is greater than \(\dfrac{\mu_i(n_i − \mu_i)}{n_i}\).
If overdispersion is present in a dataset, the estimated standard errors and test statistics for individual parameters and the overall goodnessoffit will be distorted and adjustments should be made. We will look at this briefly later when we look into continuous predictors.
Now let's review some of the output from the program smoke.sas that uses PROC LOGISTIC
Model Information
Model Information  

Data Set  WORK.SMOKE 
Response Variable (Events)  y 
Response Variable (Trials)  n 
Model  binary logit 
Optimization Technique  Fisher's scoring 
This section, as before, tells us which dataset we are manipulating, the labels of the response and explanatory variables and what type of model we are fitting (e.g. binary logit), and the type of scoring algorithm for parameter estimation. Fisher scoring is a variant of the NewtonRaphson method for ML estimation. In logistic regression they are equivalent.
Response Profile
Response Profile  

Ordered Value 
Binary Outcome  Total Frequency 
1  Event  1004 
2  Nonevent  4371 
This information tells us how many observations were "successful", e.g., how many "event" observations take value 1, that is how many children smoke, 816 + 188 = 1004 and "nonevent", i.e., how many do NOT smoke, (4019 − 816) + (1356 − 188) = 4371. Recall that the model is estimating the probability of the "event".
Class Level Information Design
Class Level Information  

Class  Value  Design Variables 
s  nosmoke  0 
smoke  1 
From an explanatory categorical (i.e, class) variable S with 2 levels (0,1), we created one dummy variable (e.g. design variable):
\(X_1 = 1\) ("smoke") if parent smoking = at least one,
\(X_1 = 0\) ("nosmoke") if parent smoking = neither
parent smoking = nosmoke is equal 0 and is the baseline.
Model Convergence Status
Model Convergence Status 

Convergence criterion (GCONV=1E8) satisfied. 
Since we are using an iterative procedure to fit the model, that is to find the ML estimates, we need some indication if the algorithm converged.
Overall goodnessoffit testing
Test: \(H_0 \colon\) current model vs. \(H_A \colon\) saturated model
Deviance and Pearson GoodnessofFit Statistics  

Criterion  Value  DF  Value/DF  Pr > ChiSq 
Deviance  0.0000  0  .  . 
Pearson  0.0000  0  .  . 
The goodnessoffit statistics \(X^2\) and \(G^2\) from this model are both zero because the model is saturated. However, suppose that we fit the interceptonly model. This is accomplished by removing the predictor from the model statement, like this:
model y/n = / scale=none;
The goodnessoffit statistics are shown below.
Deviance and Pearson GoodnessofFit Statistics  

Criterion  Value  DF  Value/DF  Pr > ChiSq 
Deviance  29.1207  1  29.1207  <.0001 
Pearson  27.6766  1  27.6766  <.0001 
The Pearson statistic \(X^2 = 27.6766\) is precisely equal to the usual \(X^2\) for testing independence in the \(2\times2\) table. And the deviance \(G^2 = 29.1207\) is precisely equal to the \(G^2\) for testing independence in the \(2\times2\) table. Thus by the assumption, the interceptonly model or the null logistic regression model states that student's smoking is unrelated to parents' smoking (e.g., assumes independence, or oddsratio=1). But clearly, based on the values of the calculated statistics, this model (i.e., independence) does NOT fit well. This example shows that analyzing a \(2\times2\) table for association is equivalent to logistic regression with a single dummy variable. Later on, we will compare these tests to the loglinear model of independence.
The goodnessoffit statistics, \(X^2\) and \(G^2\), are defined as before in the tests of independence and loglinear models (e.g. compare observed and fitted values). For the \(\chi^2\) approximation to work well, we need the \(n_i\)s sufficiently large so that the expected values, \(\hat{\mu}_i\ge5\), and \(n_i − \hat{\mu}_i \ge 5\) for most of the rows. We can afford to have about 20% of these values less than 5, but none of them should fall below 1.
With real data, we often find that the \(n_i\)s are not big enough for an accurate test, and there is no single best solution to handle this but a possible solution may rely strongly on the data and context of the problem
Analysis of Maximum Likelihood Estimates
Once an appropriate model is fitted, the success probabilities need to be estimated using the model parameters. Note that success probabilities are now NOT simply the ratio of observed number of successes and the number of trials. A model fit introduces a structure on the success probabilities. The estimates will now be functions of model parameters. What are the parameter estimates? What is the fitted model?
Analysis of Maximum Likelihood Estimates  

Parameter  DF  Estimate  Standard Error 
Wald ChiSquare 
Pr > ChiSq  
Intercept  1  1.8266  0.0786  540.2949  <.0001  
s  smoke  1  0.4592  0.0878  27.3361  <.0001 
The fitted model is
\(\mbox{logit}(\hat{\pi_i})=\log\dfrac{\hat{\pi_i}}{1\hat{\pi_i}}=\hat{\beta_0}+\hat{\beta_1} X_i=1.837+0.459\mbox{smoke}\)
where smoke is a dummy variable (e.g. design variable) that takes value 1 if at least one parent smokes and 0 if neither smokes as discussed above.
Estimated \(\beta_0=1.827\) with standard error 0.078 is significant and it says that logodds of a child smoking versus not smoking if neither parent is smoking (the baseline level) is \(1.827\) and it's statistically significant.
Estimated \(\beta_1=0.459\) with a standard error of 0.088 is significant and it says that logoddsratio of a child smoking versus not smoking if at least one parent smokes versus neither parents smokes (the baseline level) is 0.459 and it's statistically significant. \(\exp(0.459)=1.58\) are the estimated oddsratios. Thus there is a strong association between parent and child's smoking behavior, and the model fits well.
And the estimated probability of a student smoking given the explanatory variable (e.g. parent smoking behavior) is
\(\hat{\pi}_i=\dfrac{\exp(1.826+0.4592X_i)}{1+\exp(1.826+0.4592x_i)}\)
In particular, the predicted probability of a student smoking given that neither parent smokes is
\(P(Y_i=1x_i=0)=\dfrac{\exp(1.826+0.4592\times 0)}{1+\exp(1.826+0.4592\times 0)}=0.14\)
and the predicted probability of a student being a smoker if at least one parent smokes is
\(P(Y_i=1x_i=1)=\dfrac{\exp(1.826+0.4592(X_i=1)}{1+\exp(1.826+0.4592(x_i=1))}=0.20\)
By invoking the following option in the MODEL statement, output out=predict pred=prob
, PROC LOGISTIC
will print the predicted probabilities in the output:
Obs  s  y  n  prob 

1  smoke  816  4019  0.20304 
2  nosmoke  188  1356  0.13864 
So, we do not need to do this calculation by hand, but it may be useful to try it out to be sure of what's going on. In this model,\(\beta_0\) is the logodds of children smoking for nosmoking parents /(x_i = 0\). Thus \(\exp(−1.8266)\) are the odds that a student smokes when the parents do not smoke.
Looking at the \(2\times2\) table, the estimated logodds for nonsmokers is
\(\log\left(\dfrac{188}{1168}\right)=\log(0.161)=1.8266\)
which agrees with \(\hat{\beta}_0\) from the logistic model.
From the analysis of this example as a \(2\times2\) table, \(\exp(\beta_1)\) should be the estimated odds ratio. The estimated coefficient of the dummy variable,
\(\hat{\beta}_1=0.4592\)
agrees exactly with the logodds ratio from the \(2\times2\) table (i.e., \(\log(1.58) = (816\times1168) / (188\times 3203) = 0.459\)). That is exp(0.459) = 1.58, which is the estimated odds ratio of a student smoking.
To relate this to the interpretation of the coefficients in a linear regression, we could say that for every oneunit increase in the explanatory variable \(x_1\) (e.g., changing from no smoking parents to smoking parents), the odds of "success" \(\pi_i / (1 − \pi_i)\) will be multiplied by \(\exp(\beta_1)\), given that all the other variables are held constant.
This is not surprising, because in the logistic regression model \(\beta_1\) is the difference in the logodds of children smoking as we move from "nosmoke" (neither parent smokes) /(x_i = 0/) to "smoke" (at least one parent smokes) \(x_i = 1\), and the difference in logodds is a logodds ratio.
Testing Individual Parameters
Testing the hypothesis that the probability of the characteristic depends on the value of the \(j\)th variable.
Testing \(H_0\colon \beta_j = 0\) versus \(H_A\colon \beta_j \ne 0\).
The Wald chisquared statistics \(z^2=(\hat{\beta}_j/\mbox{SE}(\hat{\beta}_k))^2\) for these tests are displayed along with the estimated coefficients in the "Analysis of Maximum Likelihood Estimates" section.
The standard error for \(\hat{\beta}_1\), 0.0878, agrees exactly with the standard error that we can calculate from the corresponding \(2\times2\) table.
Analysis of Maximum Likelihood Estimates  

Parameter  DF  Estimate  Standard Error 
Wald ChiSquare 
Pr > ChiSq  
Intercept  1  1.8266  0.0786  540.2949  <.0001  
s  smoke  1  0.4592  0.0878  27.3361  <.0001 
The values indicate the significant relationship between the logit of the odds of student smoking in parents' smoking behavior. Or, we can use the information from the "Type III Analysis of Effects" section.
Type 3 Analysis of Effects  

Effect  DF  Wald ChiSquare 
Pr > ChiSq 
s  1  27.3361  <.0001 
Again, this information indicates that parents' smoking behavior is a significant factor in the model. We could compare \(z^2\) to a chisquare with one degree of freedom; the pvalue would then be the area to the right of \(z^2\) under the \(\chi^2_1\) density curve. A value of \(z^2\) (Wald statistic) bigger than 3.84 indicates that we can reject the null hypothesis \(\beta_j = 0\) at the .05level.
\(\beta_1:\left(\dfrac{0.4592}{0.0878}\right)^2=27.354\)
Confidence Intervals of Individual Parameters
An approximate \((1 − \alpha) 100\%\) confidence interval for \(\beta_j\) is given by
\(\hat{\beta}_j \pm z_{(1\alpha/2)}SE(\hat{\beta}_j)\)
For example, a 95% confidence interval for \(\beta_1\)
\(0.4592 \pm 1.96 (0.0878)= (0.287112, 0.63128)\)
where 0.0878 is the "Standard Error"' from the "Analysis of Maximum Likelihood Estimates" section. Then, the 95% confidence interval for the oddsratio of a child smoking if one parent is smoking in comparison to neither smoking is
\((\exp(0.287112), \exp(0.63128)) = (1.3325, 1.880)\)
Compare this with the output we get from PROC LOGISTIC:
Odds Ratio Estimates  

Effect  Point Estimate  95% Wald Confidence Limits 

s smoke vs nosmoke  1.583  1.332  1.880 
When fitting logistic regression, we need to evaluate the overall fit of the model, the significance of individual parameter estimates and consider their interpretation. For assessing the fit of the model, we also need to consider the analysis of residuals. Definition of Pearson, deviance, and adjusted residuals is as before, and you should be able to carry this analysis.
If we include the statement...
output out=predict pred=phat reschi=pearson resdev=deviance;
in the PROC LOGISTIC call, then SAS creates a new dataset called "results" that includes all of the variables in the original dataset, the predicted probabilities \(\hat{\pi}_i\), the Pearson residuals, and the deviance residuals. Then we can add some code to calculate and print out the estimated expected number of successes \(\hat{\mu}_i=n_i\hat{\pi}_i\) and failures \(n_i\hat{\mu}_i=n_i(1\hat{\pi}_i)\).