Following up on our brief introduction to this extremely useful distribution, we go into more detail here in preparation for the goodness-of-fit test coming up. Recall that the multinomial distribution generalizes the binomial to accommodate more than two categories. For example, what if the respondents in a survey had three choices:
- I feel optimistic.
- I don't feel optimistic.
- I'm not sure.
If we separately count the number of respondents answering each of these and collect them in a vector, we can use the multinomial distribution to model the behavior of this vector.
Properties of the Multinomial Distribution Section
The multinomial distribution arises from an experiment with the following properties:
- a fixed number \(n\) of trials
- each trial is independent of the others
- each trial has \(k\) mutually exclusive and exhaustive possible outcomes, denoted by \(E_1, \dots, E_k\)
- on each trial, \(E_j\) occurs with probability \(\pi_j , j = 1, \dots , k\).
If we let \(X_j\) count the number of trials for which outcome \(E_j\) occurs, then the random vector \(X = \left(X_1, \dots, X_k\right)\) is said to have a multinomial distribution with index \(n\) and parameter vector \(\pi = \left(\pi_1, \dots, \pi_k\right)\), which we denote as
\(X ∼ Mult\left(n, \pi\right)\)
In most problems, \(n\) is known (e.g., it will represent the sample size). Note that we must have \(\pi_1 + \cdots + \pi_k = 1\) and \(X_1+\cdots+X_k=n\).
- Marginal Counts
-
The individual or marginal components of a multinomial random vector are binomial and have a binomial distribution. That is, if we focus on the \(j\)th category as "success" and all other categories collectively as "failure", then \(Xj \sim Bin\left(n, \pi_j\right)\), for \(j=1,\ldots,k\).