Key Topics:
 Basic approach
 Null and alternative hypothesis
 Decision making and the pvalue
 Ztest & Nonparametric alternative
Basic approach to hypothesis testing
 State a model describing the relationship between the explanatory variables and the outcome variable(s) in the population and the nature of the variability. State all of your assumptions.
 Specify the null and alternative hypotheses in terms of the parameters of the model.
 Invent a test statistic that will tend to be different under the null and alternative hypotheses.
 Using the assumptions of step 1, find the theoretical sampling distribution of the statistic under the null hypothesis of step 2. Ideally the form of the sampling distribution should be one of the “standard distributions”(e.g. normal, t, binomial..)
 Calculate a pvalue, as the area under the sampling distribution more extreme than your statistic. Depends on the form of the alternative hypothesis.
 Choose your acceptable type 1 error rate (alpha) and apply the decision rule: reject the null hypothesis if the pvalue is less than alpha, otherwise do not reject.
One sample ztest

Making the Decision
It is either likely or unlikely that we would collect the evidence we did given the initial assumption. (Note: “likely” or “unlikely” is measured by calculating a probability!)
If it is likely, then we “do not reject” our initial assumption. There is not enough evidence to do otherwise.
If it is unlikely, then:
 either our initial assumption is correct and we experienced an unusual event or,
 our initial assumption is incorrect
In statistics, if it is unlikely, we decide to “reject” our initial assumption.
Example: Criminal Trial Analogy
First, state 2 hypotheses, the null hypothesis (“H_{0}”) and the alternative hypothesis (“H_{A}”)
 H_{0}: Defendant is not guilty.
 H_{A}: Defendant is guilty.
Usually the H_{0} is a statement of “no effect”, or “no change”, or “chance only” about a population parameter.
While the H_{A} , depending on the situation, is that there is a difference, trend, effect, or a relationship with respect to a population parameter.
 It can onesided and twosided.
 In twosided we only care there is a difference, but not the direction of it. In onesided we care about a particular direction of the relationship. We want to know if the value is strictly larger or smaller.
Then, collect evidence, such as finger prints, blood spots, hair samples, carpet fibers, shoe prints, ransom notes, handwriting samples, etc. (In statistics, the data are the evidence.)
Next, you make your initial assumption.
 Defendant is innocent until proven guilty.
In statistics, we always assume the null hypothesis is true.
Then, make a decision based on the available evidence.
 If there is sufficient evidence (“beyond a reasonable doubt”), reject the null hypothesis. (Behave as if defendant is guilty.)
 If there is not enough evidence, do not reject the null hypothesis. (Behave as if defendant is not guilty.)
If the observed outcome, e.g., a sample statistic, is surprising under the assumption that the null hypothesis is true, but more probable if the alternative is true, then this outcome is evidence against H_{0} and in favor of H_{A}.
An observed effect so large that it would rarely occur by chance is called statistically significant (i.e., not likely to happen by chance).
Using the pvalue to make the decision
The pvalue represents how likely we would be to observe such an extreme sample if the null hypothesis were true. The pvalue is a probability computed assuming the null hypothesis is true, that the test statistic would take a value as extreme or more extreme than that actually observed. Since it's a probability, it is a number between 0 and 1. The closer the number is to 0 means the event is “unlikely.” So if pvalue is “small,” (typically, less than 0.05), we can then reject the null hypothesis.
Significance level and pvalue
Significance level, α, is a decisive value for pvalue. In this context, significant does not mean “important”, but it means “not likely to happened just by chance”.
α is the maximum probability of rejecting the null hypothesis when the null hypothesis is true. If α = 1 we always reject the null, if α = 0 we never reject the null hypothesis. In articles, journals, etc… you may read: “The results were significant (p<0.05).” So if p=0.03, it's significant at the level of α = 0.05 but not at the level of α = 0.01. If we reject the H_{0} at the level of α = 0.05 (which corresponds to 95% CI), we are saying that if H_{0} is true, the observed phenomenon would happen no more than 5% of the time (that is 1 in 20). If we choose to compare the pvalue to α = 0.01, we are insisting on a stronger evidence!
Very Important Point!Neither decision of rejecting or not rejecting the H_{0} entails proving the null hypothesis or the alternative hypothesis. We merely state there is enough evidence to behave one way or the other. This is also always true in statistics! 
So, what kind of error could we make? No matter what decision we make, there is always a chance we made an error.
Errors in Criminal Trial:
Truth


Jury Decision 
Not Guilty

Guilty

Not Guilty 
OK

ERROR

Guilty 
ERROR

OK

Errors in Hypothesis Testing
Type I error (False positive): The null hypothesis is rejected when it is true.
 α is the maximum probability of making a Type I error.
Type II error (False negative): The null hypothesis is not rejected when it is false.
 β is the probability of making a Type II error
There is always a chance of making one of these errors. But, a good scientific study will minimize the chance of doing so!
Truth


Decision

Null Hypothesis

Alternative Hypothesis

Null Hypothesis 
OK

TYPE II ERROR

Alternative Hypothesis 
TYPE I ERROR

OK

Power
The power of a statistical test is its probability of rejecting the null hypothesis if the null hypothesis is false. That is, power is the ability to correctly reject H_{0} and detect a significant effect. In other words, power is one minus the type II error risk.
\(\text{Power }=1\beta = P\left(\text{reject} H_0  H_0 \text{is false } \right)\)
Which error is worse?
Type I = you are innocent, yet accused of cheating on the test.
Type II = you cheated on the test, but you are found innocent.
This depends on the context of the problem too. But in most cases scientists are trying to be “conservative”; it's worse to make a spurious discovery than to fail to make a good one. Our goal it to increase the power of the test that is to minimize the length of the CI.
We need to keep in mind:
 the effect of the sample size,
 the correctness of the underlying assumptions about the population,
 statistical vs. practical significance, etc…
(see the handout). To study the tradeoffs between the sample size, α, and Type II error we can use power and operating characteristic curves.
Height ExampleOne sample ztest Assume data are independently sampled from a normal distribution with unknown mean μ and known variance σ^{2} = 9. Make an initial assumption that μ = 65. Specify the hypothesis: H_{0}: μ = 65 H_{A}: μ ≠ 65 zstatistic: 3.58 zstatistic follow N(0,1) distribution The pvalue, < 0.0001, indicates that, if the average height in the population is 65 inches, it is unlikely that a sample of 54 students would have an average height of 66.4630. Alpha = 0.05. Decision: pvalue < alpha, thus reject the null hypothesis. Conclude that the average height is not equal to 65. 
What type of error might we have made?
Type I error is claiming that average student height is not 65 inches, when it really is.
Type II error is failing to claim that the average student height is not 65in when it is.
We rejected the null hypothesis, i.e., claimed that the height is not 65, thus making potentially a Type I error. But sometimes the pvalue is too low because of the large sample size, and we may have statistical significance but not really practical significance! That's why most statisticians are much more comfortable with using CI than tests.
Height ExampleGraphical summary of the ztest Based on the CI only, how do you know that you should reject the null hypothesis? The 95% CI is (65.6628,67.2631) ... What about practical and statistical significance now? Is there another reason to suspect this test, and the pvalue calculations? 
There is a need for a further generalization. What if we can't assume that σ is known? In this case we would use s (the sample standard deviation) to estimate σ.
If the sample is very large, we can treat σ as known by assuming that σ = s. According to the law of large numbers, this is not too bad a thing to do. But if the sample is small, the fact that we have to estimate both the standard deviation and the mean adds extra uncertainty to our inference. In practice this means that we need a larger multiplier for the standard error.
We need onesample ttest.
One sample ttest
 Assume data are independently sampled from a normal distribution with unknown mean μ and variance σ^{2}. Make an initial assumption, μ_{0}.
 Specify:
H_{0}: μ = μ_{0}
H_{0}: μ ≤ μ_{0}
H_{0}: μ ≥ μ_{0}vs. one of thoseH_{A}: μ ≠ μ_{0}
H_{A}: μ > μ_{0}
H_{A}: μ < μ_{0}  tstatistic: \(\frac{\bar{X}\mu_0}{s / \sqrt{n}}\) where s is a sample st.dev.
 tstatistic follows tdistribution with df = n  1
 pvalue:
 Alpha = 0.05, we conclude ….
Testing for the population proportion
Let's go back to our CNN poll. Assume we have a SRS of 1,017 adults.
We are interested in testing the following hypothesis: H_{0}: p = 0.50 vs. p > 0.50
What is the test statistic?
If alpha = 0.05, what do we conclude?
We will see more details in the next lesson on proportions, then distributions, and possible tests.