A typical linear regression model is presented as \(y=b_0+b_1X_1+ e\). However, as we pointed out, this model will not work to predict voting outcomes of yes or no. As an alternative, we can think about the odds of the event happening, as opposed to predicting the "value" of the event (as we did with OLS regression).
When we have a binary response, y, the expected value of y, \(E(y) = \pi\), where \(\pi\) denotes \(P\left(y = 1\right)\) (as a reminder the P is the probability).
Let's think through this a bit. In the above definition, we are saying that the expected value of y is the probability of the event occurring. Let's say that there is a 50% probability of voting for Serena. Our expected value for y is a 50% probability. That may make a bit more sense.
So now we can begin to think about using a linear model to model predicted values of probabilities instead of values. But how do we go from our observed data to a probability?
We need to return back to a basic concept of odds. As a matter of review, an odds is the ratio of an event occurring to the event not occurring. In our example, this would be the counts of "yes" votes for Serena to the counts of "no" votes for Serena.
We also learned that the odds ratio is the ratio of two odds. Since \(\pi = P\left(y = 1\right)\), then \(1 – \pi = P\left(y = 0\right)\). The ratio \(\frac{\pi}{1− \pi} = \frac{P\left(y = 1\right)}{P\left(y = 0\right)}\) is known as the odds of the event y = 1 occurring. For example, if \(\pi = 0.8\) then the odds of y = 1 occurring are \(\frac{0.80}{0.20} = 4\), or 4 to 1.
Hopefully, the odds ratio sounds familiar to you and is the basic principle of what is going on in logistic regression. There are a few other modeling parts that have to occur in order for the mathematics to make sense. First, we have been working with linear models in this course. Because our model is now a ratio, it is nonlinear. Therefore the software actually uses a "logit link" function. You do not need to fully understand what this is, just that it is different than the OLS method of fitting the model.
Because we have to use the logit link function, we also need to express the odds ratio as a log odds. This is actually where the name "logistic regression" comes from.
The resulting log model is:
\(\ln \left(\dfrac{\pi}{1-\pi}\right)=B_{0}+B_{1} x_{1}+\cdots+B_{k} x_{k}\)
But interpreting a log odds is really hard. Fortunately, the relationship between the log odds and a probability is fairly easy to translate. So we can algebraically manipulate the log odds model into a probability form. When we do this, we can now state the outcome (i.e. "fitted" value) in terms of a probability of an event happening!
\(\pi=\dfrac{\exp \left(B_{0}+B_{1} x_{1}+\cdots+B_{k} x_{k}\right)}{1 \exp \left(B_{0}+B_{1} x_{1}+\cdots+B_{k} x_{k}\right)}\)
Looking at a plot of the logit link function is helpful as it comes closer to the scatterplot we produced for the polling data.
From this graph, we can see that instead of a straight line fitting the binary response variable (in our example a voter voting for Serena, instead we have an S-shaped curved line (representing the logit link function). Now, our model will have a minimum value of 0 and a maximum value of 1, solving the problem of values beyond 0 and 1 observed when we incorrectly applied a simple linear regression to the voter prediction model. Also, notice that the limits of 0 and 1 are appropriate values for probabilities! Problem solved!