9.2 - Discriminant Analysis

Introduction Section

Let the feature vector be X and the class labels be Y.

The Bayes rule says that if you have the joint distribution of X and Y, and if X is given, under 0-1 loss, the optimal decision on Y is to choose a class with maximum posterior probability given X.

Discriminant analysis belongs to the branch of classification methods called generative modeling, where we try to estimate the within-class density of X given the class label. Combined with the prior probability (unconditioned probability) of classes, the posterior probability of Y can be obtained by the Bayes formula.


Assume  the prior probability or the marginal pmf for class k is denoted as \(\pi_k\),  \(\sum^{K}_{k=1} \pi_k =1  \).

πk is usually estimated simply by empirical frequencies of the training set:

\(\hat{\pi}_k=\frac{\text{# of Samples in class } k}{\text{Total # of samples}}\)

You have the training data set and you count what percentage of data come from a certain class.

Then we need the class-conditional density of X. Remember this is the density of X conditioned on the class k, or class G = k denoted by\(f _ { k } ( x )\).

According to the Bayes rule, what we need is to compute the posterior probability:


This is a conditional probability of class G given X.

By MAP (maximum a posteriori, i.e., the Bayes rule for 0-1 loss):

\(  \begin {align} \hat{G}(x) &=\text{arg }\underset{k}{max} Pr(G=k|X=x)\\
& = \text{arg }\underset{k}{max} f_k(x)\pi_k\\
\end {align} \)

Notice that the denominator is identical no matter what class k you are using. Therefore, for maximization, it does not make a difference in the choice of k. The MAP rule is essentially trying to maximize \(\pi_k\)times \(f_k(x)\).