> > ################################ > ### Example: Vitamin C & Friench skiiers > ### Lesson 5: LOGLIN() & GLM() functions > ### Related: See VitaminC.R in Lesson 3 > ################################# > > ### Here is one way to read the data vector of values with labels for the table > ski<-matrix(c(31, 17, 109, 122), ncol=2, dimnames=list(Treatment=c("Placebo", "VitaminC"), Cold=c("Cold", "NoCold"))) > ski Cold Treatment Cold NoCold Placebo 31 109 VitaminC 17 122 > > ### Here is how we did this in Lesson 3, for more details see VitaminC.R > ### Pearson's Chi-squared test with Yates' continuity correction > result<-chisq.test(ski) > result Pearson's Chi-squared test with Yates' continuity correction data: ski X-squared = 4.1407, df = 1, p-value = 0.04186 > > > #################################################################### > ################ VIA LOGLIN() ########################################### > > ### Fitting 2-way log-linear model of independence > ### using the table structure and loglin() function; this is similar to CATMOD in SAS > > ##first check if the data are in the table format > is.table(ski) [1] FALSE > > ## if not make them to be the table with > ski<-as.table(ski) > > > ## fit the model > ## list(1,2) indicates that we are fitting the two margins independently > ## fit=TRUE and param=TRUE, say that we want have the fitted values of the cells and the estimated model parameters returned > ## for other options see R help, e.g., type the ?loglin in the prompt > ski.ind<-loglin(ski, list(1, 2), fit=TRUE, param=TRUE) 2 iterations: deviation 0 > ski.ind \$lrt [1] 4.871697 \$pearson [1] 4.811413 \$df [1] 1 \$margin \$margin[[1]] [1] "Treatment" \$margin[[2]] [1] "Cold" \$fit Cold Treatment Cold NoCold Placebo 24.08602 115.91398 VitaminC 23.91398 115.08602 \$param \$param\$"(Intercept)" [1] 3.963656 \$param\$Treatment Placebo VitaminC 0.003584245 -0.003584245 \$param\$Cold Cold NoCold -0.7856083 0.7856083 > > ## get the p-value; note that this output is the same as the chi-squared test WITHOUT Yates' continuity correction. > 1-pchisq(ski.ind\$lrt, ski.ind\$df) [1] 0.02730064 > > > #### Fitting 2-way saturated log-linear model > ### using the table structure and loglin() function; this is similar to CATMOD in SAS > ski.sat<-loglin(ski, list(c(1, 2)), fit=TRUE, param=TRUE) 2 iterations: deviation 0 > ### notice the zero value for LRT, and the PERFECT fit!!! > ski.sat \$lrt [1] 0 \$pearson [1] 0 \$df [1] 0 \$margin \$margin[[1]] [1] "Treatment" "Cold" \$fit Cold Treatment Cold NoCold Placebo 31 109 VitaminC 17 122 \$param \$param\$"(Intercept)" [1] 3.940642 \$param\$Treatment Placebo VitaminC 0.1220252 -0.1220252 \$param\$Cold Cold NoCold -0.8070421 0.8070421 \$param\$Treatment.Cold Cold Treatment Cold NoCold Placebo 0.1783618 -0.1783618 VitaminC -0.1783618 0.1783618 > 1-pchisq(ski.sat\$lrt, ski.sat\$df) [1] NaN Warning message: NaNs produced in: pchisq(q, df, lower.tail, log.p) > > > #################################################################### > ################ VIA GLM() ########################################### > ### Fitting 2-way log-linear model of independence > ### using the database structure and glm() function; this is similar to GENMOD in SAS > > ### first we must make the data to be in a data frame format > ski.data<-as.data.frame(ski) > ski.data Treatment Cold Freq 1 Placebo Cold 31 2 VitaminC Cold 17 3 Placebo NoCold 109 4 VitaminC NoCold 122 > > ### using the glm() to fit the loglinear model is similar to fitting a regression model but the trick is in specifying what the correct response, and the family of distributions (i.e., sampling scheme!) > ### we need to specify the response that we are modeling which are the cell counts; ski.data\$Freq > ## this line: ski.data\$Treatment+ski.data\$Cold, specifies that we have two main effects, and no interaction term > ### we need to specify that sampling distribution, that is for log-linear models when modeling the counts we assume the most general random mechanism of Poisson sampling: family=poinsson(). This assures that errors follow Poisson model AND that we are modeling "log" of the response that is of our counts. > > ski.ind<-glm(ski.data\$Freq~ski.data\$Treatment+ski.data\$Cold, family=poisson()) > ski.ind Call: glm(formula = ski.data\$Freq ~ ski.data\$Treatment + ski.data\$Cold, family = poisson()) Coefficients: (Intercept) ski.data\$TreatmentVitaminC ski.data\$ColdNoCold 3.181632 -0.007168 1.571217 Degrees of Freedom: 3 Total (i.e. Null); 1 Residual Null Deviance: 135.5 Residual Deviance: 4.872 AIC: 34 > summary(ski.ind) Call: glm(formula = ski.data\$Freq ~ ski.data\$Treatment + ski.data\$Cold, family = poisson()) Deviance Residuals: 1 2 3 4 1.3484 -1.4918 -0.6487 0.6382 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 3.181632 0.156179 20.372 <2e-16 *** ski.data\$TreatmentVitaminC -0.007168 0.119738 -0.060 0.952 ski.data\$ColdNoCold 1.571217 0.158626 9.905 <2e-16 *** --- Signif. codes: 0 `***' 0.001 `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 135.4675 on 3 degrees of freedom Residual deviance: 4.8717 on 1 degrees of freedom AIC: 34.004 Number of Fisher Scoring iterations: 4 > anova(ski.ind) Analysis of Deviance Table Model: poisson, link: log Response: ski.data\$Freq Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL 3 135.468 ski.data\$Treatment 1 0.004 2 135.464 ski.data\$Cold 1 130.592 1 4.872 > fits<-fitted(ski.ind) > resids <- residuals(ski.ind,type="pearson") > h <- lm.influence(ski.ind)\$hat > adjresids <- resids/sqrt(1-h) > round(cbind(ski.data\$Freq,fits,adjresids),2) fits adjresids 1 31 24.09 2.19 2 17 23.91 -2.19 3 109 115.91 -2.19 4 122 115.09 2.19 > > > #### Fitting 2-way saturated log-linear model > #### using the database structure and glm() function; this is similar to GENMOD in SAS > ### Notice now that we have an interaction term in the model: ski.data\$Treatment*ski.data\$Cold > ski.sat<-glm(ski.data\$Freq~ski.data\$Treatment*ski.data\$Cold, family=poisson()) > ski.sat Call: glm(formula = ski.data\$Freq ~ ski.data\$Treatment * ski.data\$Cold, family = poisson()) Coefficients: (Intercept) ski.data\$TreatmentVitaminC 3.4340 -0.6008 ski.data\$ColdNoCold ski.data\$TreatmentVitaminC:ski.data\$ColdNoCold 1.2574 0.7134 Degrees of Freedom: 3 Total (i.e. Null); 0 Residual Null Deviance: 135.5 Residual Deviance: 2.22e-16 AIC: 31.13 > anova(ski.sat) Analysis of Deviance Table Model: poisson, link: log Response: ski.data\$Freq Terms added sequentially (first to last) Df Deviance Resid. Df Resid. Dev NULL 3 135.468 ski.data\$Treatment 1 0.004 2 135.464 ski.data\$Cold 1 130.592 1 4.872 ski.data\$Treatment:ski.data\$Cold 1 4.872 0 2.220e-16