24.1 - Definition of Sufficiency

Sufficiency is the kind of topic in which it is probably best to just jump right in and state its definition. Let's do that!

Sufficient

Let \(X_1, X_2, \ldots, X_n\) be a random sample from a probability distribution with unknown parameter \(\theta\). Then, the statistic:

\(Y = u(X_1, X_2, ... , X_n) \)

is said to be sufficient for \(\theta\) if the conditional distribution of \(X_1, X_2, \ldots, X_n\), given the statistic \(Y\), does not depend on the parameter \(\theta\).

Example 24-1 Section

Let \(X_1, X_2, \ldots, X_n\) be a random sample of \(n\) Bernoulli trials in which:

\(X_i=1\) if the \(i^{th}\) subject likes Pepsi
\(X_i=0\) if the \(i^{th}\) subject does not like Pepsi

If \(p\) is the probability that subject \(i\) likes Pepsi, for \(i = 1, 2,\ldots,n\), then:

\(X_i=1\) with probability \(p\)
\(X_i=0\) with probability \(q = 1 − p\)

Suppose, in a random sample of \(n=40\) people, that \(Y = \sum_{i=1}^{n}X_i =22\) people like Pepsi. If we know the value of \(Y\), the number of successes in \(n\) trials, can we gain any further information about the parameter \(p\) by considering other functions of the data \(X_1, X_2, \ldots, X_n\)? That is, is \(Y\) sufficient for \(p\)?

Answer

The definition of sufficiency tells us that if the conditional distribution of \(X_1, X_2, \ldots, X_n\), given the statistic \(Y\), does not depend on \(p\), then \(Y\) is a sufficient statistic for \(p\). The conditional distribution of \(X_1, X_2, \ldots, X_n\), given \(Y\), is by definition:

\(P(X_1 = x_1, ... , X_n = x_n |Y = y) = \dfrac{P(X_1 = x_1, ... , X_n = x_n, Y = y)}{P(Y=y)}\) (**)

Now, for the sake of concreteness, suppose we were to observe a random sample of size \(n=3\) in which \(x_1=1, x_2=0, \text{ and }x_3=1\). In this case:

\( P(X_1 = 1, X_2 = 0, X_3 =1, Y=1)=0\)

because the sum of the data values, \( \sum_{i=1}^{n}X_i \), is 1 + 0 + 1 = 2, but \(Y\), which is defined to be the sum of the \(X_i\)'s is 1. That is, because \(2\ne 1\), the event in the numerator of the starred (**) equation is an impossible event and therefore its probability is 0.

Now, let's consider an event that is possible, namely ( \(X_1=1, X_2=0, X_3=1, Y=2\)). In that case, we have, by independence:

\( P(X_1 = 1, X_2 = 0, X_3 =1, Y=2) = p(1-p) p=p^2(1-p)\)

So, in general:

\(P(X_1 = x_1, X_2 = x_2, ... , X_n = x_n, Y = y) = 0 \text{ if } \sum_{i=1}^{n}x_i \ne y \)

and:

\(P(X_1 = x_1, X_2 = x_2, ... , X_n = x_n, Y = y) = p^y(1-p)^{n-y} \text{ if } \sum_{i=1}^{n}x_i = y \)

Now, the denominator in the starred (**) equation above is the binomial probability of getting exactly \(y\) successes in \(n\) trials with a probability of success \(p\). That is, the denominator is:

\( P(Y=y) = \binom{n}{y} p^y(1-p)^{n-y}\)

for \(y = 0, 1, 2,\ldots, n\). Putting the numerator and denominator together, we get, if \(y=0, 1, 2, \ldots, n\), that the conditional probability is:

\(P(X_1 = x_1, ... , X_n = x_n |Y = y) = \dfrac{p^y(1-p)^{n-y}}{\binom{n}{y} p^y(1-p)^{n-y}} =\dfrac{1}{\binom{n}{y}} \text{ if } \sum_{i=1}^{n}x_i = y\)

and:

\(P(X_1 = x_1, ... , X_n = x_n |Y = y) = 0 \text{ if } \sum_{i=1}^{n}x_i \ne y \)

Aha! We have just shown that the conditional distribution of \(X_1, X_2, \ldots, X_n\) given \(Y\) does not depend on \(p\). Therefore, \(Y\) is indeed sufficient for \(p\). That is, once the value of \(Y\) is known, no other function of \(X_1, X_2, \ldots, X_n\) will provide any additional information about the possible value of \(p\).