Case-Study: Third Article
Jin’s third article looks at three different size regions (small town, suburb, and city) over the past 10 years. The article records the percentage of total land put aside for recreational use for each area each year. The article concludes that suburbs have been the most successful at setting aside recreational land. Jin is a little confused because he recognizes the article using an ANOVA, where the three different size regions are the independent variable, but doesn’t understand how the researchers included the 10 year perspective.
Foundational Concepts: The key foundational concepts this article builds upon are:
- One-way and two-way (factorial)
- ANOVA Assumption of independence from
- ANOVA Paired and independent t tests
- Covariance
To understand this idea of a time element, we need to return back to the statistical concept of independence we saw in the assumptions for linear models as well as paired t tests. Independence means that two measures are not related. This is primarily a function of the research methods used to collect data. Through random selection, I should be able to assume measures are independent, we can think of drawing numbers out of a hat, the second number should be independent of the first.
However, in the real world, we often encounter research were the concept of independence cannot be applied, Jin’s article is such an example. We can see from the data presented below, each region has multiple entries (time 1, time 2, time 3). Whenever you are taking measurements from the same unit over time, the measures are no longer independent. In Jin’s example, the percentage of land put aside for Shamrock Town at time 1 is related to the percentage of land put aside for time 2. It is assumed that local beliefs about land use, demand for land use, the people living in the town, at the 2 time points will remain relatively constant.
time | Region | percent land |
---|---|---|
1 | 0 | 11 |
1 | 1 | 26 |
1 | 2 | 20 |
2 | 0 | 56 |
2 | 1 | 83 |
2 | 2 | 71 |
3 | 0 | 15 |
3 | 1 | 34 |
3 | 2 | 41 |
4 | 0 | 6 |
While normally violating this assumption of normality would invalidate a technique such as a general linear model, modern computing and awareness of the research design to collect the data allows researchers, like the ones authoring Jin’s article, to properly account for these “repeated” non-independent measures by using a “Repeated Measure ANOVA”.
Like the factor analysis, the repeated measure ANOVA actually focuses on the covariance of each repeated observation within a unit. In Jin’s example, the repeated ANOVA takes into account the covariance of the time 1 and time 2 measurements for Shamrock Town. By taking this relationship into account, the model can appropriately compensate for the violation of independence and appropriately calculate the best fit. The snippet of output demonstrates how Minitab takes into account the “time” variable, however the output of interest is the “test of fixed effects” for the “region” variable (just like the one-way ANOVA output!
Variance Components
Source | Var | % of Total | SE Var | Z-Value | P-Value |
---|---|---|---|---|---|
time | 590.222222 | 91.43% | 497.088329 | 1.187259 | 0.118 |
Error | 55.333333 | 8.57% | 31.946715 | 1.732051 | 0.042 |
Total | 645.555556 |
Test of Fixed Effects
Term | DF Number | DF Den | F-Value | P-Value |
---|---|---|---|---|
Region | 2.00 | 6.00 | 7.88 | 0.021 |
While we have not gone into depth about repeated measure ANOVA, Jin now has an understanding of why this is slightly different than the one-way ANOVA and proceed to interpret the significance of different regions on land put aside for recreational use over time.