> ### Test of conditional independence Lesson 4: > ### Use of oddsratio(), confit() from {vcd} > ### Cochran-Mantel-Haenszel test > ### Also testing via Log-linear models: related Lesson 5 > ######################################################### > > ### Input the table > deliquent=c("no","yes") > scout=c("no", "yes") > SES=c("low", "med","high") > table=expand.grid(deliquent=deliquent,scout=scout,SES=SES) > count=c(169,42,43,11,132,20,104,14,59,2,196,8) > table=cbind(table,count=count) > table deliquent scout SES count 1 no no low 169 2 yes no low 42 3 no yes low 43 4 yes yes low 11 5 no no med 132 6 yes no med 20 7 no yes med 104 8 yes yes med 14 9 no no high 59 10 yes no high 2 11 no yes high 196 12 yes yes high 8 > temp=xtabs(count~deliquent+scout+SES,table) > temp , , SES = low scout deliquent no yes no 169 43 yes 42 11 , , SES = med scout deliquent no yes no 132 104 yes 20 14 , , SES = high scout deliquent no yes no 59 196 yes 2 8 > > ### Create "flat" contigency tables > > ftable(temp) SES low med high deliquent scout no no 169 132 59 yes 43 104 196 yes no 42 20 2 yes 11 14 8 > > ##Let's see how we can create various subtables > ### One-way Table SES > Frequency=as.vector(margin.table(temp,3)) > CumFrequency=cumsum(Frequency) > cbind(SES,Frequency=Frequency,Percentage=Frequency/sum(Frequency),CumFrequency=CumFrequency,CumPercentage=CumFrequency/sum(Frequency)) SES Frequency Percentage CumFrequency CumPercentage [1,] "low" "265" "0.33125" "265" "0.33125" [2,] "med" "270" "0.3375" "535" "0.66875" [3,] "high" "265" "0.33125" "800" "1" > > ### One-way Table scout > Frequency=as.vector(margin.table(temp,2)) > CumFrequency=cumsum(Frequency) > cbind(scout,Frequency=Frequency,Percentage=Frequency/sum(Frequency),CumFrequency=CumFrequency,CumPercentage=CumFrequency/sum(Frequency)) scout Frequency Percentage CumFrequency CumPercentage [1,] "no" "424" "0.53" "424" "0.53" [2,] "yes" "376" "0.47" "800" "1" > > ### One-way Table deliquent > Frequency=as.vector(margin.table(temp,1)) > CumFrequency=cumsum(Frequency) > cbind(deliquent,Frequency=Frequency,Percentage=Frequency/sum(Frequency),CumFrequency=CumFrequency,CumPercentage=CumFrequency/sum(Frequency)) deliquent Frequency Percentage CumFrequency CumPercentage [1,] "no" "703" "0.87875" "703" "0.87875" [2,] "yes" "97" "0.12125" "800" "1" > > ### Test the Mutual Independence, step by step > ### compute the expected frequences > E=array(NA,dim(temp)) > for (i in 1:dim(temp)[1]) { + for (j in 1:dim(temp)[2]) { + for (k in 1:dim(temp)[3]) { + E[i,j,k]=(margin.table(temp,3)[k]*margin.table(temp,2)[j]*margin.table(temp,1))[i]/(sum(temp))^2 + }}} > E , , 1 [,1] [,2] [1,] 123.42044 109.44831 [2,] 17.02956 15.10169 , , 2 [,1] [,2] [1,] 125.74913 111.51337 [2,] 17.35087 15.38663 , , 3 [,1] [,2] [1,] 123.42044 109.44831 [2,] 17.02956 15.10169 > > ### compute the X^2, and G^2 > df=(length(temp)-1)-(sum(dim(temp))-3) > X2=sum((temp-E)^2/E) > X2 [1] 214.9233 > 1-pchisq(X2,df) [1] 0 > G2=2*sum(temp*log(temp/E)) > G2 [1] 218.6622 > 1-pchisq(G2,df) [1] 0 > > > ### Test for Mutual indpendence (and other models) by considering anlaysis of all two-way tables > ### Two-way Table SES*scout > ### This is a test of Marginal Independence for SES and Scout > SES_Scout=margin.table(temp,c(3,2)) > result=chisq.test(SES_Scout) > result Pearson's Chi-squared test data: SES_Scout X-squared = 172.2025, df = 2, p-value < 2.2e-16 > result\$expected scout SES no yes low 140.45 124.55 med 143.10 126.90 high 140.45 124.55 > result=chisq.test(SES_Scout,correct = FALSE) > result Pearson's Chi-squared test data: SES_Scout X-squared = 172.2025, df = 2, p-value < 2.2e-16 > SES_Scout=list(Frequency=SES_Scout,RowPercentage=prop.table(SES_Scout,2)) > > > ### Two-way Table SES*deliquent > ### temp=xtabs(count~scout+SES+deliquent,table) > ### SES_deliquent=addmargins(temp)[1:2,1:3,3] > ### This is a test of Marginal Independence for SES and deliquent status > SES_deliquent=margin.table(temp,c(3,1)) > result=chisq.test(SES_deliquent) > result Pearson's Chi-squared test data: SES_deliquent X-squared = 32.8263, df = 2, p-value = 7.445e-08 > result\$expected deliquent SES no yes low 232.8688 32.13125 med 237.2625 32.73750 high 232.8688 32.13125 > result=chisq.test(SES_deliquent,correct = FALSE) > result Pearson's Chi-squared test data: SES_deliquent X-squared = 32.8263, df = 2, p-value = 7.445e-08 > SES_deliquent=list(Frequency=SES_deliquent,RowPercentage=prop.table(SES_deliquent,2)) > > ### Two-way Table deliquent*scout > ### This is a test of Marginal Independence for Deliquent status and the Scout status > deliquent_scout=margin.table(temp,c(1,2)) > result=chisq.test(deliquent_scout) > result\$expected scout deliquent no yes no 372.59 330.41 yes 51.41 45.59 > result=chisq.test(deliquent_scout,correct = FALSE) > result Pearson's Chi-squared test data: deliquent_scout X-squared = 7.4652, df = 1, p-value = 0.00629 > deliquent_scout1=list(Frequency=deliquent_scout,RowPercentage=prop.table(deliquent_scout,1)) > > ## compute the log(oddsraio), oddsratio and its 95% CI using {vcd} package > lor=oddsratio(deliquent_scout) > lor [1] -0.6140019 > OR=exp(lor) > OR [1] 0.5411808 > OR=oddsratio(deliquent_scout, log=FALSE) > OR [1] 0.5411808 > CI=exp(confint(lor)) > CI lwr upr [1,] 0.3475674 0.8426469 > CI=confint(OR) > CI lwr upr [1,] 0.3475674 0.8426469 > > > ### Table of deliquent*scout at different level of SES > temp , , SES = low scout deliquent no yes no 169 43 yes 42 11 , , SES = med scout deliquent no yes no 132 104 yes 20 14 , , SES = high scout deliquent no yes no 59 196 yes 2 8 > > ### Test for Joint Independence of (D,BS) > ## creating 6x2 table, BS x D > SESscout_deliquent=ftable(temp, row.vars=c(3,2)) > result=chisq.test(SESscout_deliquent) > result Pearson's Chi-squared test data: SESscout_deliquent X-squared = 32.9576, df = 5, p-value = 3.837e-06 > > ### Test for Marginal Independence (see above analysis of two-way tables) > > #### Test for conditional independence > ### To get partial tables of DB for each level of S > temp[,,1] scout deliquent no yes no 169 43 yes 42 11 > chisq.test(temp[,,1], correct=FALSE) Pearson's Chi-squared test data: temp[, , 1] X-squared = 0.0058, df = 1, p-value = 0.9392 > temp[,,2] scout deliquent no yes no 132 104 yes 20 14 > chisq.test(temp[,,2], correct=FALSE) Pearson's Chi-squared test data: temp[, , 2] X-squared = 0.101, df = 1, p-value = 0.7507 > temp[,,3] scout deliquent no yes no 59 196 yes 2 8 > chisq.test(temp[,,3], correct=FALSE) Pearson's Chi-squared test data: temp[, , 3] X-squared = 0.0534, df = 1, p-value = 0.8172 Warning message: In chisq.test(temp[, , 3], correct = FALSE) : Chi-squared approximation may be incorrect > X2=sum(chisq.test(temp[,,1], correct=FALSE)\$statistic+chisq.test(temp[,,2], correct=FALSE)\$statistic+chisq.test(temp[,,3], correct=FALSE)\$statistic) Warning message: In chisq.test(temp[, , 3], correct = FALSE) : Chi-squared approximation may be incorrect > 1-pchisq(X2,df=3) [1] 0.9837374 > > ### Cochran-Mantel-Haenszel test > mantelhaen.test(temp) Mantel-Haenszel chi-squared test without continuity correction data: temp Mantel-Haenszel X-squared = 0.008, df = 1, p-value = 0.9287 alternative hypothesis: true common odds ratio is not equal to 1 95 percent confidence interval: 0.5970214 1.6009845 sample estimates: common odds ratio 0.9776615 > mantelhaen.test(temp,correct=FALSE) Mantel-Haenszel chi-squared test without continuity correction data: temp Mantel-Haenszel X-squared = 0.008, df = 1, p-value = 0.9287 alternative hypothesis: true common odds ratio is not equal to 1 95 percent confidence interval: 0.5970214 1.6009845 sample estimates: common odds ratio 0.9776615 > > ### Breslow-Day test > ### make sure to first source/run breslowday.test.R > breslowday.test(temp) Breslow and Day test (with Tarone correction): Breslow-Day X-squared = 0.1517992 Breslow-Day-Tarone X-squared = 0.1517984 Test for test of a common OR: p-value = 0.9269096 > > ################################################################ > #################### Log-linear Models: Lesson 5 ############## > #### Let's see how we would this via Log_linear models > #### We will look at the details later in the course > ### test of conditional independence > ### via loglinear model, but the object has to be a table! > > is.table(temp) ### to check if table [1] TRUE > temp<-as.table(temp) ### to save as table > temp.condind<-loglin(temp, list(c(1,3), c(2,3)), fit=TRUE, param=TRUE) ### fit the cond.indep. model 2 iterations: deviation 0 > temp.condind \$lrt [1] 0.1623331 \$pearson [1] 0.1602389 \$df [1] 3 \$margin \$margin[[1]] [1] "deliquent" "SES" \$margin[[2]] [1] "scout" "SES" \$fit , , SES = low scout deliquent no yes no 58.698113 2.301887 yes 196.301887 7.698113 , , SES = med scout deliquent no yes no 168.800000 42.200000 yes 43.200000 10.800000 , , SES = high scout deliquent no yes no 132.859259 19.140741 yes 103.140741 14.859259 \$param \$param\$`(Intercept)` [1] 3.534926 \$param\$deliquent no yes 0.06813731 -0.06813731 \$param\$scout no yes 1.093741 -1.093741 \$param\$SES low med high -0.4782348 0.2192042 0.2590306 \$param\$deliquent.SES SES deliquent low med high no -0.6717604 0.6132997 0.05846064 yes 0.6717604 -0.6132997 -0.05846064 \$param\$scout.SES SES scout low med high no 0.5255985 -0.4005935 -0.1250050 yes -0.5255985 0.4005935 0.1250050 > 1-pchisq(temp.condind\$lrt, temp.condind\$df) [1] 0.983428 > > #### There is no way to do Breslow-Day stats in R; you need to write your own function or fit the homogeneous association model to test for identity of odds-ratios > > ### test of homogenous association model > temp.hom<-loglin(temp, list(c(1,3), c(2,3), c(1,2)), fit=TRUE, param=TRUE) 3 iterations: deviation 0.02212296 > temp.hom \$lrt [1] 0.1542917 \$pearson [1] 0.1518238 \$df [1] 2 \$margin \$margin[[1]] [1] "deliquent" "SES" \$margin[[2]] [1] "scout" "SES" \$margin[[3]] [1] "deliquent" "scout" \$fit , , SES = low scout deliquent no yes no 58.658906 2.339646 yes 196.345950 7.658038 , , SES = med scout deliquent no yes no 168.645190 42.357015 yes 43.350753 10.647002 , , SES = high scout deliquent no yes no 132.695904 19.303338 yes 103.303297 14.694960 \$param \$param\$`(Intercept)` [1] 3.534952 \$param\$deliquent no yes 0.07238318 -0.07238318 \$param\$scout no yes 1.094126 -1.094126 \$param\$SES low med high -0.4756083 0.2171818 0.2584264 \$param\$deliquent.scout scout deliquent no yes no -0.005595678 0.005595678 yes 0.005595678 -0.005595678 \$param\$deliquent.SES SES deliquent low med high no -0.6708569 0.612449 0.05840789 yes 0.6708569 -0.612449 -0.05840789 \$param\$scout.SES SES scout low med high no 0.522339 -0.3976993 -0.1246398 yes -0.522339 0.3976993 0.1246398 > 1-pchisq(temp.hom\$lrt, temp.hom\$df) [1] 0.9257548 > > > ### Here is how to do the same but with the glm() function in R > ### test of conditional independence > ### via loglinear mode, but using glm() funcion > ### the object now needs to be dataframe > temp.data<-as.data.frame(temp) > temp.data deliquent scout SES Freq 1 no no low 59 2 yes no low 196 3 no yes low 2 4 yes yes low 8 5 no no med 169 6 yes no med 43 7 no yes med 42 8 yes yes med 11 9 no no high 132 10 yes no high 104 11 no yes high 20 12 yes yes high 14 > temp.condind<-glm(temp.data\$Freq~temp.data\$scout*temp.data\$SES+temp.data\$deliquent*temp.data\$SES, family=poisson()) > summary(temp.condind) Call: glm(formula = temp.data\$Freq ~ temp.data\$scout * temp.data\$SES + temp.data\$deliquent * temp.data\$SES, family = poisson()) Deviance Residuals: 1 2 3 4 5 6 7 8 9 10 11 0.03937 -0.02155 -0.20358 0.10811 0.01539 -0.03045 -0.03081 0.06067 -0.07463 0.08449 0.19496 12 -0.22511 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.0724 0.1286 31.664 < 2e-16 *** temp.data\$scoutyes -3.2387 0.3224 -10.047 < 2e-16 *** temp.data\$SESmed 1.0563 0.1491 7.086 1.38e-12 *** temp.data\$SEShigh 0.8169 0.1538 5.311 1.09e-07 *** temp.data\$deliquentyes 1.2072 0.1459 8.273 < 2e-16 *** temp.data\$scoutyes:temp.data\$SESmed 1.8524 0.3571 5.188 2.13e-07 *** temp.data\$scoutyes:temp.data\$SEShigh 1.3012 0.3709 3.508 0.000451 *** temp.data\$SESmed:temp.data\$deliquentyes -2.5701 0.2111 -12.176 < 2e-16 *** temp.data\$SEShigh:temp.data\$deliquentyes -1.4604 0.1907 -7.660 1.86e-14 *** --- Signif. codes: 0 Ô***Õ 0.001 Ô**Õ 0.01 Ô*Õ 0.05 Ô.Õ 0.1 Ô Õ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 739.58872 on 11 degrees of freedom Residual deviance: 0.16233 on 3 degrees of freedom AIC: 82.688 Number of Fisher Scoring iterations: 3 > fitted(temp.condind) 1 2 3 4 5 6 7 8 9 10 58.698113 196.301887 2.301887 7.698113 168.800000 43.200000 42.200000 10.800000 132.859259 103.140741 11 12 19.140741 14.859259 > > ### test of homogenous association model > temp.hom<-glm(temp.data\$Freq~temp.data\$scout*temp.data\$SES+temp.data\$deliquent*temp.data\$SES+temp.data\$scout*temp.data\$deliquent, family=poisson()) > summary(temp.hom) Call: glm(formula = temp.data\$Freq ~ temp.data\$scout * temp.data\$SES + temp.data\$deliquent * temp.data\$SES + temp.data\$scout * temp.data\$deliquent, family = poisson()) Deviance Residuals: 1 2 3 4 5 6 7 8 9 10 11 0.04441 -0.02431 -0.22832 0.12214 0.02727 -0.05388 -0.05452 0.10799 -0.06037 0.06828 0.15720 12 -0.18270 Coefficients: Estimate Std. Error z value Pr(>|z|) (Intercept) 4.07175 0.12884 31.602 < 2e-16 *** temp.data\$scoutyes -3.22139 0.37545 -8.580 < 2e-16 *** temp.data\$SESmed 1.05605 0.14914 7.081 1.43e-12 *** temp.data\$SEShigh 0.81630 0.15398 5.301 1.15e-07 *** temp.data\$deliquentyes 1.20810 0.14624 8.261 < 2e-16 *** temp.data\$scoutyes:temp.data\$SESmed 1.83965 0.38421 4.788 1.68e-06 *** temp.data\$scoutyes:temp.data\$SEShigh 1.29371 0.38024 3.402 0.000668 *** temp.data\$SESmed:temp.data\$deliquentyes -2.56650 0.21486 -11.945 < 2e-16 *** temp.data\$SEShigh:temp.data\$deliquentyes -1.45846 0.19192 -7.600 2.97e-14 *** temp.data\$scoutyes:temp.data\$deliquentyes -0.02252 0.25123 -0.090 0.928579 --- Signif. codes: 0 Ô***Õ 0.001 Ô**Õ 0.01 Ô*Õ 0.05 Ô.Õ 0.1 Ô Õ 1 (Dispersion parameter for poisson family taken to be 1) Null deviance: 739.58872 on 11 degrees of freedom Residual deviance: 0.15429 on 2 degrees of freedom AIC: 84.68 Number of Fisher Scoring iterations: 3 > fitted(temp.hom) 1 2 3 4 5 6 7 8 9 10 58.659501 196.340499 2.340499 7.659501 168.645710 43.354290 42.354290 10.645710 132.694789 103.305211 11 12 19.305211 14.694789