2.3.2 - Moments

Many of the elementary properties of the multinomial can be derived by decomposing \(X\) as the sum of iid random vectors,

\(X=Y_1+\cdots+Y_n\)

where each \(Y_i \sim Mult\left(1, \pi\right)\). In this decomposition, \(Y_i\) represents the outcome of the \(i\)th trial; it's a vector with a 1 in position \(j\) if \(E_j)\) occurred on that trial and 0s in all other positions. The elements of \(Y_i\) are correlated Bernoulli random variables. For example, with \(k=2\) possible outcomes on each trial, then \(Y_i=(\# E_1,\# E_2)\) on the \(i\)th trial, and the possible values of \(Y_i\) are

(1, 0) with probability \(\pi_1\),

(0, 1) with probability \(\pi_2 = 1− \pi_1\).

Because the individual elements of \(Y_i\) are Bernoulli, the mean of \(Y_i\) is \(\pi = \left(\pi_1, \pi_2\right)\), and its covariance matrix is

\begin{bmatrix} \pi_1(1-\pi_1) & -\pi_1\pi_2 \\ -\pi_1\pi_2 & \pi_2(1-\pi_2) \end{bmatrix}

Establishing the covariance term (off-diagonal element) requires a bit more work, but note that intuitively it should be negative because exactly one of either \(E_1\) or \(E_2\) must occur.

More generally, with \(k\) possible outcomes, the mean of \(Y_i\) is \(\pi = \left(\pi_1, \dots , \pi_k\right)\), and the covariance matrix is

\begin{bmatrix} \pi_1(1-\pi_1) & -\pi_1\pi_2 & \cdots & -\pi_1\pi_k \\ -\pi_1\pi_2 & \pi_2(1-\pi_2) & \cdots & -\pi_2\pi_k \\ \vdots & \vdots & \ddots & \vdots \\ -\pi_1\pi_k & -\pi_2\pi_k & \cdots & \pi_k(1-\pi_k) \end{bmatrix}

And finally returning to \(X=Y_1+\cdots+Y_n\) in full generality, we have that

\(E(X)=n\pi=(n\pi_1,\ldots,n\pi_k)\)

with covariance matrix

\begin{bmatrix} n\pi_1(1-\pi_1) & -n\pi_1\pi_2 & \cdots & -n\pi_1\pi_k \\ -n\pi_1\pi_2 & n\pi_2(1-\pi_2) & \cdots & -n\pi_2\pi_k \\ \vdots & \vdots & \ddots & \vdots \\ -n\pi_1\pi_k & -n\pi_2\pi_k & \cdots & n\pi_k(1-\pi_k) \end{bmatrix}

Because the elements of \(X\) are constrained to sum to \(n\), this covariance matrix is singular. If all the \(\pi_j\)s are positive, then the covariance matrix has rank \(k-1\). Intuitively, this makes sense since the last element \(X_k\) can be replaced by \(n − X_1− \dots − X_{k−1}\); there are really only \(k-1\) "free" elements in \(X\). If some elements of \(\pi\) are zero, the rank drops by one for every zero element.