Depending on which algorithms you use, you end up with different ways of density estimation within every class.
In Linear Discriminant Analysis (LDA) we assume that every density within each class is a Gaussian distribution.
- Linear and Quadratic Discriminant Analysis: Gaussian densities.
In LDA we assume those Gaussian distributions for different classes share the same covariance structure. In Quadratic Discriminant Analysis (QDA) we don't have such a constraint. You will see the difference later.
- General Nonparametric Density Estimates:
You can also use general nonparametric density estimates, for instance kernel estimates and histograms.
- Naive Bayes: assume each of the class densities are products of marginal densities, that is, all the variables are independent.
There is a well-known algorithm called the Naive Bayes algorithm. Here the basic assumption is that all the variables are independent given the class label. Therefore, to estimate the class density, you can separately estimate the density for every dimension and then multiply them to get the joint density. This makes the computation much simpler.
X may be discrete, not continuous. Instead of talking about density, we will use the probability mass function. In this case, we would compute a probability mass function for every dimension and then multiply them to get the joint probability mass function.
You can see that we have swept through several prominent methods for classification. You should also see that they all fall into the Generative Modeling idea. The only essential difference is in how you actually estimate the density for every class.