9.2.5 - Estimating the Gaussian Distributions

Printer-friendly versionPrinter-friendly version

We need to estimate the Gaussian distribution. Here is the formula for estimating the πk's and the parameters in the Gaussian distributions. The formula below is actually the maximum likelihood estimator:

\[\hat{\pi}_k=N_k/N\]

where \(N_k\) is the number of class-k samples and N is the total number of points in the training data. As we mentioned, to get the prior probabilities for class k, you simply count the frequency of data points in class k.

Then, the mean vector for every class is also simple. You take all of the data points in a given class and compute the average, the sample mean:

\[\hat{\mu}_k=\sum_{g_i=k}x^{(i)}/N_k\]

Next, the covariance matrix formula looks slightly complicated. The reason is because we have to get a common covariance matrix for all of the classes. First you divide the data points in two given classes according to the given labels. If we were looking at class k, for every point we subtract the corresponding mean which we computed earlier. Then multiply its transpose. Remember x is a column vector, therefore if we have a column vector multiplied by a row vector, we get a square matrix, which is what we need.

\[\hat{\Sigma}=\sum_{k=1}^{K}\sum_{g_i=k}\left(x^{(i)}-\hat{\mu}_k \right)\left(x^{(i)}-\hat{\mu}_k \right)^T/(N-K)\]

First, we do the summation within every class k, then we have the sum over all of the classes. Next, we normalize by the scalar quantity, N - K. When we fit a maximum likelihood estimator it should be divided by N, but if it is divided by NK, we get an unbiased estimator. Remember, K is the number of classes. So, when N is large, the difference between N and N - K is pretty small.

Note that \(x^{(i)}\) denotes the ith sample vector.

In summary, if you want to use LDA to obtain a classification rule, the first step would involve estimating the parameters using the formulas above. Once you have these, then go back and find the linear discriminant function and choose a class according to the discriminant functions.