Statistical Inference and Estimation
Review of Introductory Inference
Key Concepts: Sampling distribution & Central Limit Theorem Basic concepts of estimation:
Review of Introductory Inference

Statistical Inference, Model & Estimation
Recall, a statistical inference aims at learning characteristics of the population from a sample; the population characteristics are parameters and sample characteristics are statistics.
A statistical model is a representation of a complex phenomena that generated the data.
 It has mathematical formulations that describe relationships between random variables and parameters.
 It makes assumptions about the random variables, and sometimes parameters.
 A general form: data = model + residuals
 Model should explain most of the variation in the data
 Residuals are a representation of a lackoffit, that is of the portion of the data unexplained by the model.
Estimation represents ways or a process of learning and determining the population parameter based on the model fitted to the data.
Point estimation and interval estimation, and hypothesis testing are three main ways of learning about the population parameter from the sample statistic.
An estimator is particular example of a statistic, which becomes an estimate when the formula is replaced with actual observed sample values.
Point estimation = a single value that estimates the parameter. Point estimates are single values calculated from the sample
Confidence Intervals = gives a range of values for the parameter Interval estimates are intervals within which the parameter is expected to fall, with a certain degree of confidence.
Hypothesis tests = tests for a specific value(s) of the parameter.
In order to perform these inferential tasks, i.e., make inference about the unknown population parameter from the sample statistic, we need to know the likely values of the sample statistic. What would happen if we do sampling many times?
We need the sampling distribution of the statistic
 It depends on the model assumptions about the population distribution, and/or on the sample size.
 Standard error refers to the standard deviation of a sampling distribution.
Height ExampleWe are interested in estimating the true average height of the student population at Penn State. We collect a simple random sample of 54 students. Here is a graphical summary of that sample.

Central Limit Theorem
Sampling distribution of the sample mean:
If numerous samples of size n are taken, the frequency curve of the sample means ( \(\bar{X}\)‘s) from those various samples is approximately bell shaped with mean μ and standard deviation, i.e. standard error \(\bar{X}/ \sim N(\mu , \sigma^2 / n)\)
Holds if:
 X is normally distributed
 X is NOT normal, but n is large (e.g. n >30) and μ finite.
 For continuous variables
For categorical data, the CLT holds for the sampling distribution of the sample proportion.
Proportions in Newspapers
As found in CNN in June, 2006:
The parameter of interest in the population is the proportion of U.S. adults who disapprove of how well Bush is handling Iraq, p.
The sample statistic, or point estimator is \(\hat{p}\), and an estimate, based on this sample is \(\hat{p}=0.62\).
Next question ...
If we take another poll, we are likely to get a different sample proportion, e.g. 60%, 59%,67%, etc..
So, what is the 95% confidence interval? Based on the CLT, the 95% CI is \(\hat{p}\pm 2 \ast \sqrt{\frac{\hat{p}(1\hat{p})}{n}}\).
We often assume p = 1/2 so \(\hat{p}\pm 2 \ast \sqrt{\frac{\frac{1}{2}\ast\frac{1}{2} }{n}}=\hat{p}\pm\frac{1}{\sqrt{n}}=\hat{p}\pm\text{MOE}\).
The margin of error (MOE) is 2 × St.Dev or \(1/\sqrt{n}\).