Data Collection | STAT 504

How do we collect the data?

Key concepts:

Randomization
Sampling

Observational study
Randomized experiment

Randomization

When we want to learn the characteristics of a population, we can:

sample the entire population, (such as done in a census), or
more typically select a sample from the population.

To insure the sample is representative, that is that we can learn desired characteristics about the population from the sample, we need to use a random mechanism to select the sample.

So how can this be done?

Types of Studies

Randomized Experiment: here we create differences in the explanatory variable and then examine the results:

The investigators applies one or more manipulations (i.e. treatments) to the experimental subjects

Subjects are randomly assigned to treatments

Observational Study: here we observe differences in the explanatory variables

e.g. survey data

The KEY for both is Randomization! (In the 1 bedroom data example we did a kind of a survey.)

Types of Sampling

Simple random sampling

Sample of size n from a population of size N

Equal probability of selection

Stratified random sampling

Select a random sample from each strata

e.g. proportional allocation

Reduces error

Cluster random sampling

Select a random sample from each cluster

Reduces cost, but increases error

Systematic random sampling

Simple random sampling in multiple

Simple design and administration

Experimental Design Features

Controls and placebo
Blinding
Randomization and random sampling
- controls confounding
- allows causal inference
- supports a model assumptions/probability distribution
Replication
- multiple experimental units assigned to each treatment
Blocking (like stratification)
- controls confounding
- reduces error
- improves power
Balance
- same number of units assigned to each treatment group
- improves power

Experiments vs. Observations

You can make statements of causal inference from randomized experiments. Nowadays new statistical methods are being developed for making causal inference statements from observational studies too!

Major problem: Confounding

Right Now Exercise!

Telephone-Telepathy Yahoo news article

Read the 1 page article above then define the population, sample, observational unit, parameter, and statistic.

Is this an observational study or an experiment? Why? What is the major finding?

Understand the problem

Population: All people who use the phones?
Sample: 63 subjects, but are they randomly chosen?
Experimental design
Randomization: The order of 4 callers making the calls was random.
Manipulation: Each subject was asked to identify the caller before the call was made. The callers were chosen randomly to make the calls.
What are some sources of variability?
Generalizability (scope of inference)
- Parameter: p = true hit rate; true proportion of calls where an individual predicted the caller just before receiving the call
- Statistic: = observed hit rate = 0.45
- Some issues: (1) is the sample self-selected? (2) May be a small number of callers? These are not randomly selected.

Identify the question

Do we have telephone telepathic abilities ?
- Expectation: p = 0.25
- But where does that come from?

For more information on this research topic visit the researchers web site.