5.1 - Survey Study Design

An epidemiologic survey consists of simultaneous assessment of the health outcome and exposures as well as potential confounders and effect modifiers. A survey is considered a cross-sectional study. Some epidemiologists may call it a prevalnce study. The survey results provide a 'snapshot' of a population. Surveys are a useful tool for gauging the health of a population or to monitor effectiveness of a preventative intervention or provision of emergency relief.

While a survey may provide a relatively quick and inexpensive method for assessing the health of a population, there are drawbacks as noted in Table 1 below:

Table 1: Advantages and Disadvantages of Surveys

Inexpensive Exposure may not have preceded disease or outcome. This limits the assessment of causality. For example, a survey may ask about the current behavior of smoking and a diagnosis of ashma. While the results may show an association between smoking and asthma, we may not be able to accurately determine which came first.
Relatively quick Disease and health outcomes with long duration can be over-represented.
Can help establish or clarify a hypothesis

Less severe outcomes may be over-represented becuase they may not have been diagnosed at the time of the survey.

Survey are subject to information bias (e.g. from inaccuarte recall or misdiagnosis) and selection bias (e.g. those without telephone cannot be selected for random digit dial survey)

Some considerations in the design of survey sampling

Even though this is not a course on surveys, you should be aware of some approaches to drawing a sample for an epidemiologic survey. First, if the population can be enumerated (listed), a simple random sampling approach can be used to draw a representiave sample of potential participants. For example, you might generate a list of all children attending a public school and then from this list, randomly select students for the survey. Procedures for simple random sampling can be done in many software packages, including Excel. The use of simple sampling allows us to generalize the results of the survey back to the population from which the sample was drawn.

Sometimes, we want to make sure that there are an adequate number of responses from a groups that is relatively small. To do that, we might use stratified random sampling which divides groups into homogeneos groups. Then we can draw simple random samples from each of the groups. Stratified sampling assures that selected subgroups of the population will be represented in the sample. If the strata are homogeneous, statistical precision from stratified sampling is greater than that achieved with simple random sampling. Stratified samples can be proportionate (or disproportionate) to the size of the stratum . If sampling is disproportionate, overall population estimates are constructed by weighting within-group estimates by the sampling fraction. Cluster sampling is a specific type of stratified sampling, and often refers to sampling from geographic areas. A cluster might be a zip code area in the US or streets within a city.

Systematic sampling occurs when we select our sample in a systemic manner.For example, you might select every 10th house on a street to participate in a household survey. Systematic sampling can be easier to implement than simple random sampling and may represent the population as well as a simple random sample. However, if every rth unit corresponds to an existing sequence in the population with the result that each member of the sample was selected from the same part of the recurring pattern, the sample will be biased. For example, if an observation is made every seventh day, beginning on a Monday, the entire sample will only represent Monday experiences.

Multi-stage sampling occurs when a combination of sampling methods is used.

Fially, tthere are several types of surveys that may be used but may produce biased population estimates. First, we may choose a convenience sample, such as randomly asking people on a street corner or in a store to particiapte in a survey. The convenience sample may be useful in gathering preliminary or pilot data for a future sruvey that would be larger and have more rigourous sampling methods. Finally, you may choose purposive sampling because you are particularly interested in the responses of a specifc group. Each of these approaches are useful, but to what population can the results be generalized?

Think about it!

Come up with an answer to this question and then click on the button below to reveal the answer.

In the example above, to what population can the results be generalized?

It is not clear what population the respondants will represent. Perhaps the sample will represent those individuals in the study area who are healthy enough to travel and motivated to report on health conditions in their household or village. Unknown biases are problems with convenience samples. Suppose a researcher invites community midwives to a training session where he will also assess maternal and infant health in their villages from their responses to a survey. This would be a purposive sample. A purposive sample can produce results representing the targeted group, but will over-represent those in the population who are readily available.

How did the researchers decide to sample the village of San Pablo?

We were concerned that we might not have enough time in Ecuador to adequately survey all neighborhoods of San Pablo. So, we used Google Earth and took a preliminary walking tour in order to divide the community into four approximately equal-sized (number of households) sectors (strata). We then rotated our days of surveying into each of these sectors. This assured that we had approximately an equal amount of time for surveying each sector which would be important if the sectors were substantially different (e.g., different type of water supply). As it turned out, the surveys went very well and we were able to complete the survey processs in each of the four sectors in San Pablo. Households in each sector were systematically sampled: every \(15^{th}\) house on both vertical and horizontals streets was entered into the sample. This produced a population-based estimate of the health and expoure, both self-reported, of San Pablo.

We were also interested in neighboring community, Rio Guayas. Rio Guayas had fewer households and was a planned community, substantially different from San Pablo. The houses were newer and cinder block. The water was centralized. The population was younger. We sampled every \(5^{th}\) household in Rio Guayas on both horizontal and vertical streets. This is an example of a survey where a choice was made to sample different proportions in different strata. (Rio Guayas vs the 4 sectors of San Pablo)

Survey questions and administration:

Survey questions are carefully structured in order to reduce bias. Care should be given to the wording and order of questions. Using a standard questionnaire increases reliablity and validity of the results. A reliable survey has internal consistency and produces results that are replicable. The subject would answer the question in the same way if asked again. Valid questions are those which accurately assess the specific concept that is being measured.

The process of administering a survey should be standardized to reduce potential for bias.The respondent should be informed of the purpose of the research and freely consent to participate. A survey with a low response rate is likely to have some bias.

Here are examples of research assessing the validity and reliability of a survey instrument, the Behaviorial Risk Factor Surveillance System: BRFSS Data Quality, Validity, and Reliability

(Statistics 507 is a survey course in epidemiologic research methods so we will not delve into the strengths and weaknesses of various methods for evaluating reliablity and validity of a survey instrument as might be presented in a psychometric course. You should however recognize the need to consider this type of analysis when selecting a survey instrument.)

In San Pablo, verbal informed consent was obtained from the potential respondent before administering the survey. The respondent was frequently the head of the household. The survey consisted of two components, a household component and an individual component. Questions were both closed- and open-ended. The household component was a census of all persons residing in the household as well as questions about the water supply and sanitation for the household and utilization of medical care by household members. A water sample was also collected from selected households. For the individual component, questions were directed toward the education, employment (adults) and health of each person in the household. Both components were adapted from UNICEF surveys to increase reliability and validity. The survey instrument used in San Pablo (English version) is here.