1.1 - The Working Hypothesis

Using the scientific method, before any statistical analysis can be conducted, a researcher must generate a guess, or hypothesis about what is going on. The process begins with a Working Hypothesis. This is a direct statement of the research idea. For example, a plant biologist may think that plant height may be affected by applying different fertilizers. So they might say: "Plants with different fertilizers will grow to different heights".

According to the Popperian Principle of Falsification, we can't conclusively affirm a hypothesis, but we can conclusively negate a hypothesis. So we need to translate the working hypothesis into a framework wherein we state a null hypothesis that the average height (or mean height) for plants with the different fertilizers will all be the same. The alternative hypothesis (which the biologist hopes to show) is that they are not all equal, but rather some of the fertilizer treatments have produced plants with different mean heights. The strength of the data will determine whether the null hypothesis can be rejected with a specified level of confidence.

We can imagine testing 4 groups of plants, three with three different kinds of fertilizer and the fourth untreated (a control group). Assuming the plant biologist kept all the plants under controlled conditions in the greenhouse, the effect of the fertilizer would be the only thing to differ among the groups of plants. Suppose at the end of the experiment, the biologist measured the height of each plant. A simple boxplot can then be used to illustrate the difference in the heights between the four groups, seen in the figure below. Plant height, the dependent or response variable, is seen on the vertical (y) axis versus fertilizer, the independent or explanatory variable, seen on the horizontal (x) axis.

The SGPlot ProcedureControlF1F2F3Fertilizer20.022.525.027.530.032.5Plant HeightDistribution of Plant Height by Fertilizer

 

This boxplot is a customary way to show treatment (or factor) level differences. In this case, there was only one treatment: fertilizer. The fertilizer treatment had four levels that included the control, which received no fertilizer, and the three different fertilizers. Understanding this language convention is essential as later in the course we will be using ANOVA to handle multi-factor studies (for example if the biologist manipulated the amount of water AND the type of fertilizer) and we will need to be able to refer to different treatments, each with their own set of levels.

Another alternative for viewing the differences in the heights is with a 'means plot' (a scatter or interval plot):

Plot of Height least-squares means for Fertilizer. With 95% confidence limits.20.022.525.027.530.0Height LS-MeanControlF1F2F3FertilizerLS-Means for FertilizerWith 95% Confidence Limits
 

This second plotting method for the differences in the treatment means provides essentially the same information. However, this plot illustrates the variability in the data with "error bars" that are the 95% confidence interval limits around the means. Between the statement of a Working Hypothesis and the creation of these 95% confidence intervals is a 7-step process of statistical hypothesis testing, presented in the following section.