1.1 - Measures of Central Tendency

1.1 - Measures of Central Tendency

Central Tendency: The Mean Vector

Throughout this course, we’ll use the ordinary notations for the mean of a variable. That is, the symbol $\mu$ is used to represent a (theoretical) population mean and the symbol $\bar{x}$ is used to represent a sample mean computed from observed data. In the multivariate setting, we add subscripts to these symbols to indicate the specific variable for which the mean is being given. For instance, $\mu_1$ represents the population mean for variable $X_1$ and $\bar{x}_{1}$ denotes a sample mean based on observed data for variable $X_{1}$.

The population mean is the measure of central tendency for the population. Here, the population mean for variable $j$ is

$\mu_j = E(X_{j})$

The notation $E$ stands for statistical expectation; here $E(X_{j})$ is the mean of $X_{j}$ over all members of the population, or equivalently, overall random draws from a stochastic model. For example, $\mu_j = E(X_{j})$ may be the mean of a normal variable.

The population mean $\mu_j$ for variable $j$ can be estimated by the sample mean

$\bar{x}_j = \frac{1}{n}\sum_{i=1}^{n}X_{ij}$

Note! The sample mean $\bar{x}_{j}$, because it is a function of our random data is also going to have a mean itself. In fact, the population mean of the sample mean is equal to population mean $\mu_j$; i.e.,$E(\bar{x}_j) = \mu_j$

Therefore, the $\bar{x}_{j}$ is unbiased for $\mu_j$.

Another way of saying this is that the mean of the $\bar{x}_{j}$’s over all possible samples of size $n$ is equal to $\mu_j$.

Recall that the population mean vector is $\boldsymbol{\mu}$ which is a collection of the means for each of the population means for each of the different variables.

$\boldsymbol{\mu} = \left(\begin{array}{c} \mu_1 \\ \mu_2\\ \vdots\\ \mu_p \end{array}\right)$

We can estimate this population mean vector, $\boldsymbol{\mu}$, by $\mathbf{\bar{x}}$. This is obtained by collecting the sample means from each of the variables in a single vector. This is shown below.

$\mathbf{\bar{x}} = \left(\begin{array}{c}\bar{x}_1\\ \bar{x}_2\\ \vdots \\ \bar{x}_p\end{array}\right) = \left(\begin{array}{c}\frac{1}{n}\sum_{i=1}^{n}X_{i1}\\ \frac{1}{n}\sum_{i=1}^{n}X_{i2}\\ \vdots \\ \frac{1}{n}\sum_{i=1}^{n}X_{ip}\end{array}\right) = \frac{1}{n}\sum_{i=1}^{n}\textbf{X}_i$

Just as the sample means, $\bar{x}$, for the individual variables are unbiased for their respective population means, the sample mean vector is unbiased for the population mean vector.

$E(\mathbf{\bar{x}}) = E\left(\begin{array}{c}\bar{x}_1\\\bar{x}_2\\ \vdots \\\bar{x}_p\end{array}\right) = \left(\begin{array}{c}E(\bar{x}_1)\\E(\bar{x}_2)\\ \vdots \\E(\bar{x}_p)\end{array}\right)=\left(\begin{array}{c}\mu_1\\\mu_2\\\vdots\\\mu_p\end{array}\right)=\boldsymbol{\mu}$

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility