# 9.2.5 - Estimating the Gaussian Distributions

9.2.5 - Estimating the Gaussian DistributionsWe need to estimate the Gaussian distribution. Here is the formula for estimating the \(\pi_k\)'s and the parameters in the Gaussian distributions. The formula below is actually the maximum likelihood estimator:

\(\hat{\pi}_k=N_k/N\)

where \(N_k\) is the number of class-*k* samples and *N* is the total number of points in the training data. As we mentioned, to get the prior probabilities for class *k*, you simply count the frequency of data points in class *k*.

Then, the mean vector for every class is also simple. You take all of the data points in a given class and compute the average, the sample mean:

\(\hat{\mu}_k=\sum_{g_i=k}x^{(i)}/N_k\)

Next, the covariance matrix formula looks slightly complicated. The reason is that we have to get a common covariance matrix for all of the classes. First, you divide the data points into two given classes according to the given labels. If we were looking at class *k*, for every point we subtract the corresponding mean which we computed earlier. Then multiply its transpose. Remember *x* is a column vector, therefore if we have a column vector multiplied by a row vector, we get a square matrix, which is what we need.

\(\hat{\Sigma}=\sum_{k=1}^{K}\sum_{g_i=k}\left(x^{(i)}-\hat{\mu}_k \right)\left(x^{(i)}-\hat{\mu}_k \right)^T/(N-K)\)

First, we do the summation within every class *k*, then we have the sum over all of the classes. Next, we normalize by the scalar quantity, *N* - *K*. When we fit a maximum likelihood estimator it should be divided by *N*, but if it is divided by *N* – *K, we get* an unbiased estimator. Remember, *K* is the number of classes. So, when *N* is large, the difference between N and *N* - *K* is pretty small.

Note that \(x^{(i)}\) denotes the *i*th sample vector.

In summary, if you want to use LDA to obtain a classification rule, the first step would involve estimating the parameters using the formulas above. Once you have these, then go back and find the linear discriminant function and choose a class according to the discriminant functions.