11.2 - Introduction to Bootstrapping

In this section, we will start by reviewing the concept of sampling distributions. Recall, we can find the sampling distribution of any summary statistic. Then, the method of bootstrapping samples to find the approximate sampling distribution of a statistic is introduced.

Review of Sampling Distributions Section

Before looking at the bootstrapping method, we will need to recall the idea of sampling distributions. More specifically, let's look at the sampling distribution of the sample mean, \(\bar{x}\).

Suppose we are interested in estimating the population mean, \(\mu\). To do this, we find a random sample of size \(n\) and calculate the sample mean, \(\bar{x}\). But how do we know how good of an estimate \(\bar{x}\) is? To answer this question, we need to find the standard deviation of the estimate.

Recall that \(\bar{x}\) is calculated from a random sample and is, therefore, a random variable. Let's call the sample mean from above \(\bar{x}_1\). Now suppose we gather another random sample of size \(n\) and calculate \(\bar{x}\) from that sample and denote it \(\bar{x}_2\). Take another sample, and so on and so on. With many of these samples, we can construct a histogram of the sample means.

With theory and the central limit theorem, we have the following summary:

If the sample satisfied at least one of the following:

The distribution of the random variable, \(X\), is Normal
The sample size is large; rule of thumb is \(n>30\)

...then the sampling distribution of \(\bar{X}\) is approximately Normal with

Mean: \(\mu\)
Standard deviation: \(\frac{\sigma}{\sqrt{n}}\)
Standard error: \(\frac{s}{\sqrt{n}}\)

Using the above, we can construct confidence intervals, and hypothesis test for the population mean, \(\mu\).

What happens when we do not know the underlying distribution and cannot resample from the distribution? How could we estimate certain sample statistics? This is what we try to answer in the next section.