6.4 - Practical Significance

6.4 - Practical Significance

In the last lesson, you learned how to identify statistically significant differences using hypothesis testing methods. If the p value is less than the \(\alpha\) level (typically 0.05), then the results are statistically significant. Results are said to be statistically significant when the difference between the hypothesized population parameter and observed sample statistic is large enough to conclude that it is unlikely to have occurred by chance. 

Practical significance refers to the magnitude of the difference, which is known as the effect size. Results are practically significant when the difference is large enough to be meaningful in real life. What is meaningful may be subjective and may depend on the context.

Note that statistical significance is directly impacted by sample size. Recall that there is an inverse relationship between sample size and the standard error (i.e., standard deviation of the sampling distribution). Very small differences will be statistically significant with a very large sample size. Thus, when results are statistically significant it is important to also examine practical significance. Practical significance is not directly influenced by sample size.

Example: Weight-Loss Program

Researchers are studying a new weight-loss program. Using a large sample they construct a 95% confidence interval for the mean amount of weight loss after six months on the program to be [0.12, 0.20]. All measurements were taken in pounds. Note that this confidence interval does not contain 0, so we know that their results were statistically significant at a 0.05 alpha level. However, most people would say that the results are not practically significant because a six month weight-loss program should yield a mean weight loss much greater than the one observed in this study. 

Effect Size

For some tests there are commonly used measures of effect size. For example, when comparing the difference in two means we often compute Cohen's \(d\) which is the difference between the two observed sample means in standard deviation units:

\[d=\frac{\overline x_1 - \overline x_2}{s_p}\]

Where \(s_p\) is the pooled standard deviation

\[s_p= \sqrt{\frac{(n_1-1)s_1^2 + (n_2 -1)s_2^2}{n_1+n_2-2}}\]

Below are commonly used standards when interpreting Cohen's \(d\):

Cohen's \(d\) Interpretation
0 - 0.2 Little or no effect
0.2 - 0.5 Small effect size
0.5 - 0.8 Medium effect size
0.8 or more Large effect size

For a single mean, you can compute the difference between the observed mean and hypothesized mean in standard deviation units: \[d=\frac{\overline x - \mu_0}{s}\]

For correlation and regression we can compute \(r^2\) which is known as the coefficient of determination. This is the proportion of shared variation. We will learn more about \(r^2\) when we study simple linear regression and correlation at the end of this course.

Example: SAT-Math Scores

Test Taking

Research question:  Are SAT-Math scores at one college greater than the known population mean of 500?

\(H_0\colon \mu = 500\)

\(H_a\colon \mu >500\)

Data are collected from a random sample of 1,200 students at that college. In that sample, \(\overline{x}=506\) and the sample standard deviation was 100. A one-sample mean test was performed and the resulting p-value was 0.0188. Because \(p \leq \alpha\), the null hypothesis should be rejected. These results are statistically significant. There is evidence that the population mean is greater than 500.

But, let's also consider practical significance. The difference between an SAT-Math score 500 and an SAT-Math score of 506 is very small. With a standard deviation of 100, this difference is only \(\frac{506-500}{100}=0.06\) standard deviations. In most cases, this would not be considered practically significant. 

Example: Commute Times

Research question: Are the mean commute times different in Atlanta and St. Louis?

Descriptive Statistics: Commute Time
City N Mean StDev
Atlanta 500 29.110 20.718
St. Louis 500 21.970 14.232

Using the dataset built in to StatKey, a two-tailed randomization test was conducted resulting in a p value < 0.001. Because the null hypothesis was rejected, the results are said to be statistically significant.

Practical significance can be examined by computing Cohen's d. We'll use the equations from above:

\[d=\frac{\overline x_1 - \overline x_2}{s_p}\]

Where \(s_p\) is the pooled standard deviation

\[s_p= \sqrt{\frac{(n_1-1)s_1^2 + (n_2 -1)s_2^2}{n_1+n_2-2}}\]

First, we compute the pooled standard deviation:

\[s_p= \sqrt{\frac{(500-1)20.718^2 + (500-1)14.232^2}{500+500-2}}\]

\[s_p= \sqrt{\frac{(499)(429.236)+ (499)(202.550)}{998}}\]

\[s_p= \sqrt{\frac{214188.527+ 101072.362}{998}}\]

\[s_p= \sqrt{\frac{315260.853}{998}}\]

\[s_p= \sqrt{315.893}\]

\[s_p= 17.773\]

Note: The pooled standard deviation should always be between the two sample standard deviations.

Next, we can compute Cohen's d:

\[d=\frac{29.110-21.970}{17.773}\]

\[d=\frac{7.14}{17.773}\]

\[d= 0.402\]

 

The mean commute time in Atlanta was 0.402 standard deviations greater than the mean commute time in St. Louis. Using the guidelines for interpreting Cohen's d in the table above, this is a small effect size. 


Legend
[1]Link
Has Tooltip/Popover
 Toggleable Visibility