Factorization Theorem

Printer-friendly versionPrinter-friendly version

While the definition of sufficiency provided on the previous page may make sense intuitively, it is not always all that easy to find the conditional distribution of X1, X2, ..., Xn given Y. Not to mention that we'd have to find the conditional distribution of X1, X2, ..., Xn given Y for every Y that we'd want to consider a possible sufficient statistic! Therefore, using the formal definition of sufficiency as a way of identifying a sufficient statistic for a parameter θ can often be a daunting road to follow. Thankfully, a theorem often referred to as the Factorization Theorem provides an easier alternative! We state it here without proof.

Factorization Theorem. Let X1, X2, ..., Xn denote random variables with joint probability density function or joint probability mass function f(x1, x2, ..., xn; θ), which depends on the parameter θ. Then, the statistic \(Y = u(X_1, X_2, ... , X_n)  \) is sufficient for θ if and only if the p.d.f (or p.m.f.) can be factored into two components, that is:

\[f(x_1, x_2, ... , x_n;\theta) = \phi [ u(x_1, x_2, ... , x_n);\theta ] h(x_1, x_2, ... , x_n)  \]

where:

  • φ is a function that depends on the data x1, x2, ..., xn only through the function u(x1x2,..., xn), and
  • the function h(x1, x2, ..., xn) does not depend on the parameter θ

Let's put the theorem to work on a few examples!

fishExample

Let X1, X2, ..., Xn denote a random sample from a Poisson distribution with parameter λ > 0. Find a sufficient statistic for the parameter λ.

Solution. Because X1, X2, ..., Xn is a random sample, the joint probability mass function of X1, X2, ..., Xn is, by independence:

\[f(x_1, x_2, ... , x_n;\lambda) = f(x_1;\lambda) \times f(x_2;\lambda) \times ... \times f(x_n;\lambda)\]

Inserting what we know to be the probability mass function of a Poisson random variable with parameter λ, the joint p.m.f. is therefore:

\[f(x_1, x_2, ... , x_n;\lambda) = \frac{e^{-\lambda}\lambda^{x_1}}{x_1!} \times\frac{e^{-\lambda}\lambda^{x_2}}{x_2!} \times ... \times \frac{e^{-\lambda}\lambda^{x_n}}{x_n!}\]

Now, simpliyfing, by adding up all n of the λs in the exponents, as well as all n of the xi's in the exponents, we get:

\[f(x_1, x_2, ... , x_n;\lambda) = \left(e^{-n\lambda}\lambda^{\Sigma x_i}  \right) \times \left( \frac{1}{x_1! x_2! ... x_n!}  \right)\]

Hey, look at that! We just factored the joint p.m.f. into two functions, one (φ) being only a function of the statistic  \(Y=\sum_{i=1}^{n}X_i\) and the other (h) not depending on the parameter λ:

equation

Therefore, the Factorization Theorem tells us that \(Y=\sum_{i=1}^{n}X_i\) is a sufficient statistic for λ. But, wait a second! We can also write the joint p.m.f. as:

\[f(x_1, x_2, ... , x_n;\lambda) = \left(e^{-n\lambda}\lambda^{n\bar{x}}  \right) \times \left( \frac{1}{x_1! x_2! ... x_n!}  \right)\]

Therefore, the Factorization Theorem tells us that \(Y = \bar{X}\) is also a sufficient statistic for λ

If you think about it, it makes sense that \(Y = \bar{X}\) and \(Y=\sum_{i=1}^{n}X_i\) are both sufficient statistics, because if we know \(Y = \bar{X}\), we can easily find \(Y=\sum_{i=1}^{n}X_i\). And, if we know \(Y=\sum_{i=1}^{n}X_i\), we can easily find \(Y = \bar{X}\).

The previous example suggests that there can be more than one sufficient statistic for a parameter θ. In general, if Y is a sufficient statistic for a parameter θ, then every one-to-one function of Y not involving θ is also a sufficient statistic for θ. Let's take a look at another example.

liberty bellExample

Let X1, X2, ..., Xn be a random sample from a normal distribution with mean μ and variance 1. Find a sufficient statistic for the parameter μ.

Solution. Because X1, X2, ..., Xn is a random sample, the joint probability density function of X1, X2, ..., Xn is, by independence:

\[f(x_1, x_2, ... , x_n;\mu) = f(x_1;\mu) \times f(x_2;\mu) \times ... \times f(x_n;\mu)\]

Inserting what we know to be the probability density function of a normal random variable with mean μ and variance 1, the joint p.d.f. is:

\[f(x_1, x_2, ... , x_n;\mu) = \frac{1}{(2\pi)^{1/2}} exp \left[ -\frac{1}{2}(x_1 - \mu)^2 \right] \times \frac{1}{(2\pi)^{1/2}} exp \left[ -\frac{1}{2}(x_2 - \mu)^2 \right] \times ... \times \frac{1}{(2\pi)^{1/2}} exp \left[ -\frac{1}{2}(x_n - \mu)^2 \right] \]

Collecting like terms, we get:

\[f(x_1, x_2, ... , x_n;\mu) = \frac{1}{(2\pi)^{n/2}} exp \left[ -\frac{1}{2}\sum_{i=1}^{n}(x_i - \mu)^2 \right]\]

A trick to making the factoring of the joint p.d.f. an easier task is to add 0 to the quantity in parentheses in the summation. That is:

eqn

Now, squaring the quantity in parentheses, we get:

\[f(x_1, x_2, ... , x_n;\mu) = \frac{1}{(2\pi)^{n/2}} exp \left[ -\frac{1}{2}\sum_{i=1}^{n}\left[ (x_i - \bar{x})^2 +2(x_i - \bar{x}) (\bar{x}-\mu)+ (\bar{x}-\mu)^2\right] \right]\]

And then distributing the summation, we get:

\[f(x_1, x_2, ... , x_n;\mu) = \frac{1}{(2\pi)^{n/2}} exp \left[ -\frac{1}{2}\sum_{i=1}^{n} (x_i - \bar{x})^2 - (\bar{x}-\mu) \sum_{i=1}^{n}(x_i - \bar{x}) -\frac{1}{2}\sum_{i=1}^{n}(\bar{x}-\mu)^2\right] \]

But, the middle term in the exponent is 0, and the last term, because it doesn't depend on the index i, can be added up n times:

eqn

So, simplifying, we get:

\[f(x_1, x_2, ... , x_n;\mu) = \left\{ exp \left[ -\frac{n}{2} (\bar{x}-\mu)^2 \right] \right\} \times \left\{ \frac{1}{(2\pi)^{n/2}} exp \left[ -\frac{1}{2}\sum_{i=1}^{n} (x_i - \bar{x})^2 \right] \right\} \]

In summary, we have factored the joint p.d.f. into two functions, one (φ) being only a function of the statistic \(Y = \bar{X}\) and the other (h) not depending on the parameter μ:

eqn

Therefore, the Factorization Theorem tells us that \(Y = \bar{X}\) is a sufficient statistic for μ. Now, \(Y = \bar{X}^3\) is also sufficient for μ, because if we are given the value of \( \bar{X}^3\), we can easily get the value of \(\bar{X}\) through the one-to-one function \(w=y^{1/3}\). That is:

\[ W=(\bar{X}^3)^{1/3}=\bar{X} \]

On the other hand, \(Y = \bar{X}^2\) is not a sufficient statistic for μ, because it is not a one-to-one function. That is, if we are given the value of \(\bar{X}^2\), using the inverse function:

\[w=y^{1/2}\]

we get two possible values, namely:

\(-\bar{X}\)  and \(+\bar{X}\)

We're getting so good at this, let's take a look at one more example!

exponentialExample

Let X1, X2, ..., Xn be a random sample from an exponential distribution with parameter θ. Find a sufficient statistic for the parameter θ.

Solution. Because X1, X2, ..., Xn is a random sample, the joint probability density function of X1, X2, ..., Xn is, by independence:

\[f(x_1, x_2, ... , x_n;\theta) = f(x_1;\theta) \times f(x_2;\theta) \times ... \times f(x_n;\theta)\]

Inserting what we know to be the probability density function of an exponential random variable with parameter θ, the joint p.d.f. is:

\[f(x_1, x_2, ... , x_n;\theta) =\frac{1}{\theta}exp\left( \frac{-x_1}{\theta}\right) \times \frac{1}{\theta}exp\left( \frac{-x_2}{\theta}\right) \times ... \times \frac{1}{\theta}exp\left( \frac{-x_n}{\theta} \right) \]

Now, simpliyfing, by adding up all n of the θs and the n xi's in the exponents, we get:

\[f(x_1, x_2, ... , x_n;\theta) =\frac{1}{\theta^n}exp\left( - \frac{1}{\theta} \sum_{i=1}^{n} x_i\right) \]

We have again factored the joint p.d.f. into two functions, one (φ) being only a function of the statistic \(Y=\sum_{i=1}^{n}X_i\) and the other (h) not depending on the parameter θ:

eqn

Therefore, the Factorization Theorem tells us that \(Y=\sum_{i=1}^{n}X_i\) is a sufficient statistic for θ. And, since \(Y = \bar{X}\) is a one-to-one function of \(Y=\sum_{i=1}^{n}X_i\), it implies that \(Y = \bar{X}\) is also a sufficient statistic for θ.