Reviews

# Review for Lessons 1 to 4 (Exam 1)

Review for Lessons 1 to 4 (Exam 1)

## Introduction

This page is essentially the page of formulas and notes that you can use to study for Exam 1. You will find a printable version of this in Canvas that you can print out and bring to your proctored exams. The printable version also includes the normal table which you may need for the exam as well.

However, this web page contains much more information than the printable version. Click on the 'Tell Me More...' links to access the basic idea behind, examples and further references for the topics listed on this page.

## Outline of Material Covered on Exam 1

Margin of Error
The Margin of Error for a sample proportion from a random sample is around $$1 / \sqrt{n}$$ where n is the sample size. It does not depend on the population size.
Sampling Types
• simple random sampling
• stratified sampling
• cluster sampling
• systematic sampling
• non-probability sampling schemes (e.g. voluntary / convenience / self-selected / haphazard)
Comparative Study types:
• observational versus experimental
• retrospective versus prospective
• matched pair & block designs
• subject-blinded /researcher-blinded / double-blinded
Variable types
• explanatory / response / confounding
• categorical / ordinal / discrete measurement / continuous measurement
Measurement Issues:
• bias
• reliability
• validity
Sampling Issues
• low response rate
• nonresponse bias
• question wording issues

sampling frame ≠ population; small sample size (low reliability); non-probability sampling schemes

Experiment Issues:
• confounding variables
• interacting variables
• placebo, Hawthorne, and experimenter effects
• lack of ecological validity
• generalizability
Observational Study Issues:
• confounding
• claiming causation when only association is shown
• extending the results inappropriately
• using the past as a source of data
The Five Number Summary
• minimum, lower quartile, median, upper quartile, maximum
Measure of Location
• mean
• median
Measures of Variability:
• standard deviation
• $$IQR (= Q_U – Q_L)$$
Measures of Relative Standing
• percentiles
• standard scores (also known as z-scores)
Pictures of Distributions
• Boxplots or Histograms for Measurement Variables
• piecharts or bar-graphs for categorical variables (bar graphs for ordinal variables)
• time series plots for tracking summaries over time (issues: trend / seasonality / random fluctuations)
Distribution Shapes:
• skewed left / skewed right / symmetric / bimodal
• normal (bell-shaped)
Standardized Score
The z-score is equal to the value minus the average all divided by the standard deviation
Observed Value
The observational value is equal to the mean plus the product of the standardized score times the standard deviation
Emperical Rule
if a distribution is close to the normal curve then about 68% of the values are within one standard deviation of the mean and 95% are within two standard deviations of the mean
Percentiles of the normal distribution depend only on standard scores (z)

# Review for Lessons 5 to 8 (Exam 2)

Review for Lessons 5 to 8 (Exam 2)

## Introduction

This page is essentially the page of formulas and notes that you can use to study for Exam 2.  You will find a printable version of this in Canvas that you can print out and bring to your proctored exams. The printable version also includes the normal table which you may need for the exam as well.

## Outline of Material Covered on Exam 2

Correlation measures strength of linear relationship

Linear prediction: Estimated as $$y = a + bx$$, where '$$a$$' is the intercept which is the value of $$y$$ when $$x=0$$; '$$b$$' is the slope which is the change in $$y$$ per unit increase in $$x$$.

Correlation/regression issues: non-linearity; outliers; avoiding extrapolation in prediction; correlation is not causation.

Risk = proportion (or percentage) of the time an adverse outcome occurs.

Increased risk = percentage change from baseline risk to risk of exposed group.

Relative risk = (Risk for exposed group) / (risk for baseline group).

Odds of an event = (Chance of event) / (1 – chance of event).

Odds ratio = (odds of event in one group) / (odds of event in another group).

Simpson’s Paradox: An observed association between two variables can change or even reverse direction when there is another variable that interacts strongly with both variables.

Probabilities (also called "chances") are numbers between 0 and 1.

• If two events are mutually exclusive, then they have no outcomes in common and the probability that one or the other occurs is found by adding the chances.
• If two events are independent, then the chance of one thing don’t change when you know how the other turned out and the probability that both occur is found by multiplying the chances.
• If the ways one event happens are a subset of the ways for another event, then its probability can’t be higher.
• Probabilities of an event can be simulated by repeating the process many times on a computer and keeping track of the relative frequency of the times that the event happens.

Expectation (long run average) = sum of values × probabilities.

Law of Large Numbers: Averages or proportions are likely to be more stable when there are more trials while sums or counts are likely to be more variable. This does not happen by compensation for a bad run of luck since independent trials have no memory.

Sampling distribution of a summary statistic = the distribution of values you would get if you repeat the basic process over-and-over again.

Normal Approximation: the sampling distribution of averages or proportions from a large number of independent trials will approximately follow the normal curve.

The standard deviation of a sample proportion is $$\sqrt{p(1 − p)/n}$$.

The standard deviation of a sample mean is $$\frac{(\text{population standard deviation})}{n}= \frac{\sigma}{n}$$.

# Review for Lessons 9 to 11

Review for Lessons 9 to 11

## Introduction

This page is essentially the page of formulas and notes that you can use to study for the material from Lesson 9 through 11. You will find a printable version of this in Canvas that you can print out and bring to your proctored exams. The printable version also includes the normal table which you may need for the exam as well.

## Outline of Material Covered from Lesson 9 to 11

Standard error of the mean (SEM) =

$\frac{(\text{sample standard deviation})}{\sqrt{n}}$

Standard error of a proportion (SEP) =

$\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}$

Standard error of the difference between sample values from two independent samples =

$\sqrt{(\text{standard error from 1st sample})^2+(\text{standard error from 2nd sample})^2}$

Confidence Interval Step-by-Step:

• Step 1: What are the parameter and the statistic?
• Step 2: *Does the normal approximation apply?
• check: random sample or independent trials?
• check: large enough sample?
• Step 3: Estimate the standard deviation of the statistic (also called the standard error)
• Step 4: Compute   $$\text{statistic} \pm z^*$$ (where $$z^*$$ is the standard error)

Notice the common form in this last step (formula for standard depends on type statistic).

Hypothesis Testing Step-by-Step:

• Step 1: Ask what is the parameter of interest (e.g. is it a mean, a proportion, or the difference between means or proportions)? Write the null and alternative hypotheses as statements about this parameter.
• Step 2: Ask what is the sample statistic and its distribution if the null hypothesis is true? If it is normally distributed, calculate the standard score.
• Step 3: Ask how likely is what happened? Use the tables to find the P-value.
• Step 4: Ask what can I conclude?

Hypothesis Test (also called significance tests) Caveats:

1. Large Sample Caution: significant results based on large samples may not be of practical significance.
2. Small Sample Caution: results that are not significant in small samples may still be of practical significance.
3. Multiple Testing Caution: When a large number of significance tests are conducted, some individual tests may be deemed significant just by chance (false positives).

Human Subjects Issues: avoid physical or psychological harm; voluntary participation; protect vulnerable populations; ensure informed consent.

 Multiplier numbers from the normal distribution Confidence Level $$z^*$$ 50% 0.67 60% 0.84 70% 1.04 80% 1.282 90% 1.645 950% 1.96 99% 2.58 99.9% 3.29

 [1] Link ↥ Has Tooltip/Popover Toggleable Visibility