9.2 - Discriminant Analysis

Printer-friendly versionPrinter-friendly version

Introduction

Let the feature vector be X and the class labels be Y.

The Bayes rule says that if you have the joint distribution of X and Y, and if X is given, under 0-1 loss, the optimal decision on Y is to choose a class with maximum posterior probability given X.

Discriminant analysis belongs to the branch of classification methods called generative modeling, where we try to estimate the within class density of X given the class label. Combined with the prior probability (unconditioned probability) of classes, the posterior probability of Y can be obtained by the Bayes formula.

Notation

Assume  the prior probability or the marginal pmf for class k is denoted as πk,  \(\sum^{K}_{k=1} \pi_k =1  \).

πk is usually estimated simply by empirical frequencies of the training set:

\[\hat{\pi}_k=\frac{\text{# of Samples in class } k}{\text{Total # of samples}}\]

You have the training data set and you count what percentage of data come from a certain class.

Then we need the class-conditional density of X. Remember this is the density of X conditioned on the class k, or class G = k denoted by fk(x).

According to the Bayes rule, what we need is to compute the posterior probability:

\[Pr(G=k|X=x)=\frac{f_k(x)\pi_k}{\sum^{K}_{l=1}f_l(x)\pi_l}\]

This is a conditional probability of class G given X.

By MAP (maximum a posteriori, i.e., the Bayes rule for 0-1 loss):

\(  \begin {align} \hat{G}(x) &=\text{arg }\underset{k}{max} Pr(G=k|X=x)\\
& = \text{arg }\underset{k}{max} f_k(x)\pi_k\\
\end {align} \)

Notice that the denominator is identical no matter what class k you are using. Therefore, for maximization, it does not make a difference in the choice of k. The MAP rule is essentially trying to maximize \(\pi_k\)times \(f_k(x)\).