Printer-friendly versionPrinter-friendly version

Modern technologies such as microarrays and sequencing allow us to measure most features of the DNA and RNA.  For example, using sequencing technologies we can sequence strands of DNA or RNA.  Alternatively, we can use microarrays to bind to predetermined sequences.  We are limited only by our ability to retrieve the nucleic acids of interest from the tissue.  For example, we can sequence pieces of chromosomes or transcripts.  We can enrich the sample for exons, and sequence or bind primarily exons.  We can allow proteins to bind to the DNA and enrich for the sites at which they are attached.  We can find the methylation sites.  Because nucleic acids are fairly simple chemically, the main impediments to what we can measure is finding a way to enrich the samples for the molecules of interest (or fragments of these molecules).

However, no measurement system is perfect.  Although the instrumentation is continually improving, there are many types of error that can be introduced.  Good study designs attempt to minimize these, and to quantify those that remain.

Measurement error - This term is used for the non-reproducible noise introduced during measurement.  It may be due to sample preparation, instrumentation or other problems.

Bias - A statistician refers to bias when the measurement is systematically (and reproducibly) wrong.   For example, some of genomic regions are more difficult to retrieve than others. Therefore we might get under-representation of these regions in every sample. For many of our measurements, we need to fragment the DNA - if  certain regions are weaker than others and tend to break preferentially there may be over-representation of those regions.  Bias can occur at many stages in the study - for example, methods for handling the organisms might induce stress reactions, minor differences in how different investigators harvest tissue might induce gene expression differences, and so on.

Contamination - Sample contamination may introduce DNA or RNA from another organism into the data. 

Mapping problems  - Modern sequencing data are often mapped to a reference to determine what is in the sample.  A number of problems can arise.  Since the reference for an organism is developed from a limited number of samples, the current sample may have genomic variants that interfere with the match.  Sample preparation and sequencing can introduce errors that interfere with the match.  Similarities among regions of the DNA can create ambiguities in where the match occurs. 

Platform specific biases - These are biases that are due to  the way that we do the measurement.  For example, using microarrays, we can only detect features that have a match on the array.  Using sequencing, we may not be able to measure repetitive items in the genome or transcriptome.