Printer-friendly versionPrinter-friendly version

Suppose two SNPs, A and B are in the same gene. We only observe the individual genotype for each SNP, we don't get to see how they sit together along the chromosome. So, if the person is homozygous for A, (AA), but heterozygous for B, (Bb), then we know that one chromosome  had the major allele for AB and on the other the major for A the minor for B, (Ab). But what if the individual is heterozygous for both, Aa and Bb? Then we don't know which of the four possibilities they have.  In many cases, if there are 2 SNPs on the same gene, certain combinations are more likely to be associated with phenotypes than others.  Even if this is not the case, linkage disequilibrium induces a correlation in the SNP frequencies which will affect our multiple testing adjustments.  In the most severe case, the alleles are always inherited together, so that there is 100% correlation between the tests of association at the 2 sites.  

The haplotype is the set of variants in a single gene that are inherited as a unit from one of the parents (and so are on the same copy of the chromosome).  When the gene is transcribed and then translated, all of these variants will occur together in the resulting protein.

Let's examine the animation below from the Learn.Genetics website to learn about haplotypes. Click on the 'Watch' button below and a new window will open up with the animation. Click on the second button, "What is a haplotype?"

Watch!

One of the questions of interest is how to infer the haplotype from the individual SNPs. This is sometimes called phasing. With family data, it is easier to do phasing because the offspring have a chromosome from each parent. So, unless there is a recombination event (which moves genetic material between the two copies of the chromosome in the parent while producing the germ cells) the offspring inherit the haplotype from the parents.

The animation from Learn.Genetics states that you can think of a person's haplotype pair as their 'SNP profile'. As a statistician, the haplotype is the way of reducing the very dense set of data with 1 million features into something that has a, hopefully, smaller more meaningful set of features. As well, we expect the haplotypes to have less linkage disequilibrium than the SNPs within each haplotype, reducing the problem of correlation of the features."