1.1 - Measures of Central Tendency

Central Tendency: The Mean Vector Section

Throughout this course, we’ll use the ordinary notations for the mean of a variable. That is, the symbol \(\mu\) is used to represent a (theoretical) population mean and the symbol \(\bar{x}\) is used to represent a sample mean computed from observed data. In the multivariate setting, we add subscripts to these symbols to indicate the specific variable for which the mean is being given. For instance, \(\mu_1\) represents the population mean for variable \(X_1\) and \(\bar{x}_{1}\) denotes a sample mean based on observed data for variable \(X_{1}\).

The population mean is the measure of central tendency for the population. Here, the population mean for variable \(j\) is

\[\mu_j = E(X_{j})\]

The notation \(E\) stands for statistical expectation; here \(E(X_{j})\) is the mean of \(X_{j}\) over all members of the population, or equivalently, overall random draws from a stochastic model. For example, \(\mu_j = E(X_{j})\) may be the mean of a normal variable.

The population mean \(\mu_j\) for variable \(j\) can be estimated by the sample mean

\[\bar{x}_j = \frac{1}{n}\sum_{i=1}^{n}X_{ij}\]

Note! The sample mean \(\bar{x}_{j}\), because it is a function of our random data is also going to have a mean itself. In fact, the population mean of the sample mean is equal to population mean \(\mu_j\); i.e.,\[E(\bar{x}_j) = \mu_j \]

Therefore, the \(\bar{x}_{j}\) is unbiased for \(\mu_j\).

Another way of saying this is that the mean of the \(\bar{x}_{j}\)’s over all possible samples of size \(n\) is equal to \(\mu_j\).

Recall that the population mean vector is \(\boldsymbol{\mu}\) which is a collection of the means for each of the population means for each of the different variables.

\(\boldsymbol{\mu} = \left(\begin{array}{c} \mu_1 \\ \mu_2\\ \vdots\\ \mu_p \end{array}\right)\)

We can estimate this population mean vector, \(\boldsymbol{\mu}\), by \(\mathbf{\bar{x}}\). This is obtained by collecting the sample means from each of the variables in a single vector. This is shown below.

\(\mathbf{\bar{x}} = \left(\begin{array}{c}\bar{x}_1\\ \bar{x}_2\\ \vdots \\ \bar{x}_p\end{array}\right) = \left(\begin{array}{c}\frac{1}{n}\sum_{i=1}^{n}X_{i1}\\ \frac{1}{n}\sum_{i=1}^{n}X_{i2}\\ \vdots \\ \frac{1}{n}\sum_{i=1}^{n}X_{ip}\end{array}\right) = \frac{1}{n}\sum_{i=1}^{n}\textbf{X}_i\)

Just as the sample means, \(\bar{x}\), for the individual variables are unbiased for their respective population means, the sample mean vector is unbiased for the population mean vector.

\(E(\mathbf{\bar{x}}) = E\left(\begin{array}{c}\bar{x}_1\\\bar{x}_2\\ \vdots \\\bar{x}_p\end{array}\right) = \left(\begin{array}{c}E(\bar{x}_1)\\E(\bar{x}_2)\\ \vdots \\E(\bar{x}_p)\end{array}\right)=\left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right)=\boldsymbol{\mu}\)