2.3 - Survey & Sampling Design

Even though this is not a course on survey design, a large source of public health data comes from surveys. As we saw earlier in the course, it is often not feasible to take measurements on the entire target population, so we must select a sample in which to gather data. This section introduces some advantages and disadvantages of using surveys and approaches to drawing a sample for an epidemiologic survey.

Survey Studies Section

An epidemiologic survey consists of a simultaneous assessment of the health outcome and exposures as well as potential confounders and effect modifiers. A survey given at a single time point can be part of a cross-sectional study. Some epidemiologists may call it a prevalence study. The survey results provide a 'snapshot' of a population. Surveys are a useful tool for gauging the health of a population or monitoring the effectiveness of a preventative intervention or provision of emergency relief.  

While a survey may provide a relatively quick and inexpensive method for assessing the health of a population, there are both pros and cons, as noted below:


  • Inexpensive
  • Relatively quick
  • Can help establish or clarify a hypothesis


  • Exposure may not have preceded disease or outcome. This limits the assessment of causality. For example, a survey may ask about the current behavior of smoking and a diagnosis of asthma. While the results may show an association between smoking and asthma, we may not be able to accurately determine which came first.
  • Disease and health outcomes with a long duration can be over-represented. Less severe outcomes may be under-represented because they may not have been diagnosed at the time of the survey.
  • Surveys are subject to information bias (e.g. from inaccurate recall or misdiagnosis) and selection bias (e.g. those without a telephone cannot be selected for a random digit dial survey)

Survey Questions and Administration

Survey questions are carefully structured in order to reduce bias. Care should be given to the wording and order of questions. Using a standard questionnaire increases the reliability and validity of the results. A reliable survey has internal consistency and produces results that are replicable. The subject would answer the question in the same way if asked again. Valid questions are those which accurately assess the specific concept that is being measured.

The process of administering a survey should be standardized to reduce the potential for bias. The respondent should be informed of the purpose of the research and freely consent to participate. A survey with a low response rate is likely to have some bias.

STAT 507 is a course in epidemiologic research methods so we will not delve into the strengths and weaknesses of various methods for evaluating the reliability and validity of a survey instrument as might be presented in a psychometric course. You should however recognize the need to consider this type of analysis when selecting a survey instrument.

Sampling Designs Section

These methods of sampling can be applied to survey studies, as well as other observational and interventional studies.

First, if the population can be enumerated (listed), a simple random sampling approach can be used to draw a representative sample of potential participants. For example, you might generate a list of all children attending a public school and then from this list, randomly select students for the survey. Procedures for simple random sampling can be done in many software packages, including Excel. The use of simple sampling allows us to generalize the results of the survey back to the population from which the sample was drawn.
Sometimes, we want to make sure that there are an adequate number of responses from a group that is relatively small. To do that, we might use stratified random sampling which divides groups into homogeneous groups. Then we can draw simple random samples from each of the groups. Stratified sampling assures that selected subgroups of the population will be represented in the sample. If the strata are homogeneous, statistical precision from stratified sampling is greater than that achieved with simple random sampling. Stratified samples can be proportionate (or disproportionate) to the size of the stratum. If sampling is disproportionate, overall population estimates are constructed by weighting within-group estimates by the sampling fraction. Cluster sampling is a specific type of stratified sampling, and often refers to sampling from geographic areas. A cluster might be a zip code area in the US or streets within a city.
Systematic sampling occurs when we select our sample in a systemic manner. For example, you might select every 10th house on a street to participate in a household survey. Systematic sampling can be easier to implement than simple random sampling and may represent the population as well as a simple random sample. However, if every rth unit corresponds to an existing sequence in the population with the result that each member of the sample was selected from the same part of the recurring pattern, the sample will be biased. For example, if an observation is made every seventh day, beginning on a Monday, the entire sample will only represent Monday experiences.
Finally, there are several types of surveys that may be used but may produce biased population estimates. First, we may choose a convenience sample, such as randomly asking people on a street corner or in a store to participate in a survey. The convenience sample may be useful in gathering preliminary or pilot data for a future survey that would be larger and have more rigorous sampling methods. Finally, you may choose purposive sampling because you are particularly interested in the responses of a specific group.

Each of these approaches is useful, but to what population can the results be generalized?