1.2 - Measures of Dispersion

Dispersion: Variance, Standard Deviation Section

Variance: A variance measures the degree of spread (dispersion) in a variable’s values.

Theoretically, a population variance is the average squared difference between a variable’s values and the mean for that variable. The population variance for variable \(X_j\) is

Population Variance: The population variance for variable \(X_j\) is; \(\sigma_j^2 = E(X_j-\mu_j)^2\)

Note that the squared residual \((X_{j}-\mu_{j})^2\) is a function of the random variable \(X_{j}\). Therefore, the squared residual itself is random and has a population mean. The population variance is thus the population mean of the squared residual. We see that if the data tend to be far away from the mean, the squared residual will tend to be large, and hence the population variance will also be large. Conversely, if the data tend to be close to the mean, the squared residual will tend to be small, and hence the population variance will also be small.

Sample Variance: The population variance \(\sigma _{j}^{2}\) can be estimated by the sample variance; \begin{align} s_j^2 &= \frac{1}{n-1}\sum_{i=1}^{n}(X_{ij}-\bar{x}_j)^2\\&= \frac{\sum_{i=1}^{n}X_{ij}^2- n \bar{x}_j^2 }{n-1} \\&=\frac{\sum_{i=1}^{n}X_{ij}^2-\left(\left(\sum_{i=1}^{n}X_{ij}\right)^2/n\right)}{n-1} \end{align}

The first expression in this formula is most suitable for interpreting the sample variance. We see that it is a function of the squared residuals; that is, take the difference between the individual observations and their sample mean, and then square the result. Here, we may observe that if observations tend to be far away from their sample means, then the squared residuals and hence the sample variance will also tend to be large.

If on the other hand, the observations tend to be close to their respective sample means, then the squared differences between the data and their means will be small, resulting in a small sample variance value for that variable.

The last part of the expression above gives the formula that is most suitable for computation, either by hand or by a computer! Since the sample variance is a function of the random data, the sample variance itself is a random quantity, and so has a population mean. In fact, the population mean of the sample variance is equal to the population variance:

\[E(s_j^2) = \sigma_j^2\]

That is, the sample variance \(s _{j}^{2}\) is unbiased for the population variance \(\sigma _{j}^{2}\).

Our textbook (Johnson and Wichern, 6th ed.) uses a sample variance formula derived using maximum likelihood estimation principles. In this formula, the division is by \(n\) rather than \(n-1\).

\[s_j^2 = \frac{\sum_{i=1}^{n}(X_{ij}-\bar{x}_j)^2}{n}\]

Example 1-1: Pulse Rates Section

Suppose that we have observed the following \(n =\) 5 resting pulse rates: 64, 68, 74, 76, 78

Find the sample mean, variance and standard deviation.

Answer

The sample mean is \(\bar{x} = \dfrac{64+68+74+76+78}{5}=72\).

The maximum likelihood estimate of the variance, the one consistent with our text, is

\begin{align} s^2 &= \frac{(64-72)^2+(68-72)^2+(74-72)^2+(76-72)^2+(78-72)^2}{5}\\&=\frac{136}{5} \\&= 27.2 \end{align}

The standard deviation based in this method is \(s=\sqrt{27.2}=5.215\).

The more commonly used variance estimate, the one given by statistical software, would be \(\frac{136}{5-1}=34\). The standard deviation would be \(s = \sqrt{34}=5.83\).