Data Collection

How do we collect the data?

Key concepts:

Observational study
Randomized experiment


When we want to learn the characteristics of a population, we can:

  • sample the entire population, (such as done in a census), or
  • more typically select a sample from the population.

To insure the sample is representative, that is that we can learn desired characteristics about the population from the sample, we need to use a random mechanism to select the sample.

So how can this be done?

Types of Studies

Randomized Experiment: here we create differences in the explanatory variable and then examine the results:

  • The investigators applies one or more manipulations (i.e. treatments) to the experimental subjects
  • Subjects are randomly assigned to treatments

Observational Study: here we observe differences in the explanatory variables

  • e.g. survey data

The KEY for both is Randomization! (In the 1 bedroom data example we did a kind of a survey.)

Types of Sampling

Simple random sampling

  • Sample of size n from a population of size N
  • Equal probability of selection

Stratified random sampling

  • Select a random sample from each strata
  • e.g. proportional allocation
  • Reduces error

Cluster random sampling

  • Select a random sample from each cluster
  • Reduces cost, but increases error

Systematic random sampling

  • Simple random sampling in multiple
  • Simple design and administration

Experimental Design Features

  • Controls and placebo
  • Blinding
  • Randomization and random sampling
    • controls confounding
    • allows causal inference
    • supports a model assumptions/probability distribution
  • Replication
    • multiple experimental units assigned to each treatment
  • Blocking (like stratification)
    • controls confounding
    • reduces error
    • improves power
  • Balance
    • same number of units assigned to each treatment group
    • improves power

Experiments vs. Observations

You can make statements of causal inference from randomized experiments. Nowadays new statistical methods are being developed for making causal inference statements from observational studies too!

Major problem: Confounding

Right Now Exercise!

Telephone-Telepathy Yahoo news article

Read the 1 page article above then define the population, sample, observational unit, parameter, and statistic.

Is this an observational study or an experiment? Why? What is the major finding?

Understand the problem

  • Population: All people who use the phones?
  • Sample: 63 subjects, but are they randomly chosen?
  • Experimental design
  • Randomization: The order of 4 callers making the calls was random.
  • Manipulation: Each subject was asked to identify the caller before the call was made. The callers were chosen randomly to make the calls.
  • What are some sources of variability?
  • Generalizability (scope of inference)
    • Parameter: p = true hit rate; true proportion of calls where an individual predicted the caller just before receiving the call
    • Statistic: = observed hit rate = 0.45
    • Some issues: (1) is the sample self-selected? (2) May be a small number of callers? These are not randomly selected.

Identify the question

  • Do we have telephone telepathic abilities ?
    • Expectation: p = 0.25
    • But where does that come from?

For more information on this research topic visit the researchers web site.