10.2 - Discriminant Analysis Procedure

Discriminant analysis is a 7-step procedure.

Step 1: Collect training data

Training data are data with known group memberships. Here, we actually know which population contains each subject. For example, in the Swiss Bank Notes, we actually know which of these are genuine notes and which others are counterfeit examples.

Step 2: Prior Probabilities

The prior probability \(p_i\) represents the expected portion of the community that belongs to population \(\pi_{i}\). There are three common choices:

Equal priors: \(\hat{p}_i = \frac{1}{g}\) This is useful if we believe that all of the population sizes are equal
Arbitrary priors were selected according to the investigator's beliefs regarding the relative population sizes.
Note! We require:
\(\hat{p}_1 + \hat{p}_2 + \dots + \hat{p}_g = 1\)
Estimated priors:
\(\hat{p}_i = \dfrac{n_i}{N}\)

where \(n_{i}\) is the number observations from population \(\pi_{i}\) in the training data, and \(N = n _ { 1 } + n _ { 2 } + \ldots + n _ { g }\)

Step 3: Bartlett's test

Use Bartlett’s test to determine if the variance-covariance matrices are homogeneous for all populations involved. The result of this test will determine whether to use Linear or Quadratic Discriminant Analysis.:

Case 1: Linear

Linear discriminant analysis is for homogeneous variance-covariance matrices:

\(\Sigma_1 = \Sigma_2 = \dots = \Sigma_g = \Sigma\)

In this case, the variance-covariance matrix does not depend on the population.

Case 2: Quadratic

Quadratic discriminant analysis is used for heterogeneous variance-covariance matrices:

\(\Sigma_i \ne \Sigma_j\) for some \(i \ne j\)

This allows the variance-covariance matrices to depend on the population.

Note! We do not discuss testing whether the means of the populations are different. If they are not, there is no case for DA

Step 4: Estimate the parameters of the conditional probability density functions \(f ( \mathbf{X} |\pi_{i})\).

Here, we shall make the following standard assumptions:

The data from group i has common mean vector \(\boldsymbol{\mu_i}\)
The data from group i have a common variance-covariance matrix \(\Sigma\).
Independence: The subjects are independently sampled.
Normality: The data are multivariate normally distributed.

Step 5: Compute discriminant functions.

This is the rule to classify the new object into one of the known populations.

Step 6: Use cross-validation to estimate misclassification probabilities.

As in all statistical procedures, it is helpful to use diagnostic procedures to assess the efficacy of the discriminant analysis. We use cross-validation to assess the classification probability. Typically you are going to have some prior rule as to what is an acceptable misclassification rate. Those rules might involve things like, "what is the cost of misclassification?" This could come up in a medical study where you might be able to diagnose cancer. There are really two alternative costs. The cost of misclassifying someone as having cancer when they don't. This could cause a certain amount of emotional grief! There is also the alternative cost of misclassifying someone as not having cancer when in fact they do have it. The cost here is obviously greater if early diagnosis improves cure rates.

Step 7: Classify observations with unknown group memberships.

The procedure described above assumes that the unit or subject being classified actually belongs to one of the considered populations. If you have a study where you look at two species of insects, A and B, and the insect to classify actually belongs to species C, then it will obviously be misclassified as to belonging to either A or B.