9.1.1 - Fitting Logistic Regression Models

How do we estimate the parameters? How do we fit a logistic regression model?

We need a certain optimization criterion for choosing the parameters.

Optimization Criterion

What we want to do is to find parameters that maximize the conditional likelihood of class labels G given X using the training data.  We are not interested in the distribution of X, instead, our focus is on the conditional probabilities of the class labels given X.

Given point xi , the posterior probability for the class to be k is denoted by:

\(p_k(x_i ; \theta) = Pr(G = k | X = x_i ; \theta) \)

Given the first input x1, the posterior probability of its class, denoted as g1, is computed by:

\(Pr(G = g_1| X = x_1)\).

Since samples in the training data set are assumed independent, the posterior probability for the N sample points each having class \(g_i , i =1, 2, \cdots , N\), given their inputs \(x_1, x_2, \cdots , x_N\) is:

\(\prod_{i=1}^{N} Pr(G=g_i|X=x_i)\)

In other words, the joint conditional likelihood is the product of the conditional probabilities of the classes given every data point.

The conditional log-likelihood of the class labels in the training data set becomes a summation:

\(  \begin {align} l(\theta) &=\sum_{i=1}^{N}\text{ log }Pr(G=g_i|X=x_i)\\
& = \sum_{i=1}^{N}\text{ log }p_{g_{i}}(x_i; \theta) \\
\end {align} \)