4.3 - Statistical Biases

For a point estimator, statistical bias is defined as the difference between the parameter to be estimated and the mathematical expectation of the estimator.

Statistical bias can result from methods of analysis or estimation. For example, if the statistical analysis does not account for important prognostic factors (variables that are known to affect the outcome variable), then it is possible that the estimated treatment effects will be biased. Fortunately, many statistical biases can be corrected, whereas design flaws lead to biases that cannot be corrected.

The simplest example of statistical bias is in the estimation of the variance in the one-sample situation with \(Y_1, \dots , Y_n\) denoting independent and identically distributed random variables and \(\bar{Y}\) denoting their sample mean. Define:

\(s^2=\frac{1}{n-1}\sum_{i=1}^{n}\left ( Y_i -\bar{Y} \right )^2\)

and

\(v^2=\frac{1}{n}\sum_{i=1}^{n}\left ( Y_i -\bar{Y} \right )^2 \)

The statistic \(s^2\) is unbiased because its mathematical expectation is the population variance, \(\sigma^2\). The statistic \(v^2\) is biased because its mathematical expectation is \(\dfrac{\sigma^2 (n-1)}{n}\). The statistic \(v^2\) tends to underestimate the population variance.

Thus, bias of \(v^2\) is \(\dfrac{\sigma^2(n-1)}{n} -\sigma^2 = - \dfrac{\sigma^2}{n}\). Obviously, as the sample size, n, gets larger, the bias becomes negligible.