11.2.1 - Bootstrapping Methods

Point estimates are helpful to estimate unknown parameters but in order to make inference about an unknown parameter, we need interval estimates. Confidence intervals are based on information from the sampling distribution, including the standard error.

What if the underlying distribution is unknown? What if we are interested in a population parameter that is not the mean, such as the median? How can we construct a confidence interval for the population median?

If we have sample data, then we can use bootstrapping methods to construct a bootstrap sampling distribution to construct a confidence interval.

Bootstrapping is a topic that has been studied extensively for many different population parameters and many different situations. There are parametric bootstrap, nonparametric bootstraps, weighted bootstraps, etc. We merely introduce the very basics of the bootstrap method. To introduce all of the topics would be an entire class in itself.

Bootstrapping: Bootstrapping is a resampling procedure that uses data from one sample to generate a sampling distribution by repeatedly taking random samples from the known sample, with replacement.

Let’s show how to create a bootstrap sample for the median. Let the sample median be denoted as \(M\).

Steps to create a bootstrap sample:

Replace the population with the sample
Sample with replacement \(B\) times. \(B\) should be large, say 1000.
Compute sample medians each time, \(M_i\)
Obtain the approximate distribution of the sample median.

If we have the approximate distribution, we can find an estimate of the standard error of the sample median by finding the standard deviation of \(M_1,...,M_B\).

Sampling with replacement is important. If we did not sample with replacement, we would always get the same sample median as the observed value. The sample we get from sampling from the data with replacement is called the bootstrap sample.

Once we find the bootstrap sample, we can create a confidence interval. For a 90% confidence interval, for example, we would find the 5th percentile and the 95th percentile of the bootstrap sample.

You can create a bootstrap sample to find the approximate sampling distribution of any statistic, not just the median. The steps would be the same except you would calculate the appropriate statistic instead of the median.

Video: Bootstrapping

Sampling R Code from the Bootstrapping Video

sampling.distribution <- function(n = 100, B = 1000, mean = 5, sd = 1, confidence = 0.95) {
  median <- rep(0, B)
  for (i in 1:B) {
    median[i] <- median(rnorm(n, mean = mean, sd = sd))
  }
  med.obs <- median(median)
  c.l <- round((1 - confidence) / 2 * B, 0)
  c.u <- round(B - (1 - confidence) / 2 * B, 0)
  l <- sort(median)[c.l]
  u <- sort(median)[c.u]
  cat(c.l / 1000 * 100, "-percentile:      ", l, "\n")
  cat("Median: ", med.obs, "\n")
  cat(c.u / 1000 * 100, "-percentile:      ", u, "\n")
  return(median)
}

bootstrap.median <- function(data, B = 1000, confidence = 0.95) {
  n <- length(data)
  median <- rep(0, B)
  for (i in 1:B) {
    median[i] <- median(sample(data, size = n, replace = T))
  }
  med.obs <- median(median)
  c.l <- round((1 - confidence) / 2 * B, 0)
  c.u <- round(B - (1 - confidence) / 2 * B, 0)
  l <- sort(median)[c.l]
  u <- sort(median)[c.u]
  cat(c.l / 1000 * 100, "-percentile:      ", l, "\n")
  cat("Median: ", med.obs, "\n")
  cat(c.u / 1000 * 100, "-percentile:      ", u, "\n")
  return(median)
}