8.2 - Random Error

Random Error
Random error is the false association between exposure and disease that arises from chance and can arise from two sources - measurement error and sampling variability.

Measurement error

Measurement error occurs when there is a mistake in assessing the exposure or the outcome.  
Consider the figure below. If the true value is the center of the target, the measured responses in the first instance are the goal. There is a negligible random error (high precision), and the measurements are accurate (reliable). The second target has a negligible random error, but the measurements are not accurate. The third has a random error but is still accurate. The fourth has a random error and is inaccurate.  

Accuracy and Precision
Accuracy and Imprecision
Inaccuracy and Precision
Inaccuracy and Imprecision

Methods to increase precision and reduce random error

  1. Increase the sample size of the study
  2. Repeat a measurement with a study

Sampling variability

Sampling variability refers to the fact that there are a huge number of possible samples that can be drawn from any single population, and there will be variation among the different possible samples that could be selected. We take a sample because it is not feasible to measure the entire population and we hope that the sample we select is representative of the population. It is best to choose a random sample rather than a non-random one, but a random sample can still be unrepresentative of the population simply by chance. Selecting a large enough sample size can help minimize the chance of selecting an unrepresentative sample.  

P-values and confidence intervals

Epidemiologists use hypothesis testing to assess the role of random error and to make statistical inferences. P-values can be useful in understanding relationships, but they cannot be the only tools used to make inferences.  It is very important to provide good estimates along with confidence intervals in order to make good scientific conclusions.

P-value
P-value:  the probability of obtaining the test statistic you got, or one more extreme, assuming that the null hypothesis is true.
  • Probability between 0-1
  • p<0.05 is a typical cut-off for significance
  • Small – evidence to suggest a difference in groups
  • Large – no evidence to suggest a difference in groups

Issues with p-values

  • Dependent on both the magnitude of the association and the sample size
  • Can be viewed as providing black/white conclusions.  If p<0.05 claim significance, but p>0.05 is non-significant.  How different really is a p-value of 0.04 versus 0.06?
  • statistical significance does not imply clinical significance

Read The ASA Statement on p-Values: Context, Process, and Purpose (tandfonline.com).  The ASA (American Statistical Association) concludes that:

Good statistical practice, as an essential component of good scientific practice, emphasizes principles of good study design and conduct, a variety of numerical and graphical summaries of data, understanding of the phenomenon under study, interpretation of results in context, complete reporting and proper logical and quantitative understanding of what data summaries mean. No single index should substitute for scientific reasoning.

Confidence Intervals

Confidence Intervals provide a way to quantify the amount of random error in an estimate. Once the estimate of interest is calculated (ie. cumulative incidence, incidence rate, etc), there are many formulas (depending on the measure) that can be used to calculate the confidence interval.

General Formula for a 95% CI
The general formula for a 95% CI is estimated ± 1.96*sn, where s is the standard deviation and n is the sample size. The 1.96 is from the normal distribution, to estimate a 95% CI.

You can see that the width of the CI will decrease as the sample size increases, and as the standard deviation decreases. 

The true parameter from the population is unknown (because we can’t measure the entire population), so we calculate our estimate from the sample we selected. Once we put a CI around our estimate, it either does or does not contain the true estimate - we don’t know.  The idea of the confidence interval is that if we repeated the exercise (select a sample, calculate the estimate, calculate the CI) that 95% of the CIs we constructed would contain the true estimate. It does NOT mean that we are 95% confident that the CI contains the true mean. As stated above, it either does or does not, we can’t know which.