Printer-friendly versionPrinter-friendly version

We say that samples are pooled when units that might be measured separately are processed together in such a way that the separate measurements can no longer be determined.  For example, a tissue sample might be considered a pool of single cells.  However, more typically when we discuss pooling we mean that biological replicates are processed together at some stage, such as RNA extraction or microarray hybridization.

Typically, the decision to use sample pooling is due to the inability to obtain enough experimental material from a single individual.  In this era of single cell processing, this is seldom called for in well-established protocols, but may be required for protocols which require substantial amounts of starting material.  On the other hand, pooling is similar to averaging, and properly done pooling can be used to improve precision when the processing costs are high relative to the costs of obtaining biological replicates. [1]

Suppose that you are comparing gene expression in wild-type versus mutant plants in the root tip and that you have the resources to measure 3 biological replicates of each type of plant.  If the plants are readily grown and the root tips sectioned, you might start by sectioning the root tips of 30 plants of each genotype.  You could then make pools of tips frm 10 plants, using different plants in each pool.  The combined tissue samples could then be processed together and become a biological replicate.  Alternatively, if RNA extractions are inexpensive, you could do individual extractions for each of the 30 plants, but then combine the RNA in equal aliquots to form pools of 10 before labeling.

To keep the variability and the variance estimates correct for subsequent statistical analysis, you need to do your pooling in exactly the same way for each pool.  Even if one of the plants has the capacity to provide a larger sample, you cannot use that plant to replace independent replicates in the pool.  This is because the pools are now your biological replicates and are assumed to be equally variable. If we consider the RNA from a single feature in a pool of 10 plants, it will have about the average amount of RNA; if the bulk of the pool comes from a single plant, the RNA in the pool will be more similar to a single plant than to an average.

A common mistake is to create a pooled sample, and then split into subsamples.  Subsamples of a pool are technical replicates.  This is the optimal way to design studies to determine the reproducibility of measurements from the measurement platform because there will be only technical variation.  However, for studies meant to determine differential expression between different treatments, we need an estimate of biological variation.  Sample splitting does not provide a measure of the biological variability - multiple independently pooled samples are required..

References

[1] Biswas, S., et al. Biological Averaging in RNA-Seq. arXiv:1309.0670v2, 2013.