All of the models we have discussed thus far have been linear in the parameters (i.e., linear in the beta's). For example, polynomial regression was used to model curvature in our data by using higher-ordered values of the predictors. However, the final regression model was just a linear combination of higher-ordered predictors.

Now we are interested in studying the **nonlinear regression **model:

\(\begin{equation*}

Y=f(\textbf{X},\beta)+\epsilon,

\end{equation*}\)

where **X** is a vector of *p* predictors, \(\beta\) is a vector of *k* parameters, \(f(\cdot)\) is some known regression function, and \(\epsilon\) is an error term whose distribution may or may not be normal. Notice that we no longer necessarily have the dimension of the parameter vector simply one greater than the number of predictors. Some examples of nonlinear regression models are:

\(\begin{align*}

y_{i}&=\frac{e^{\beta_{0}+\beta_{1}x_{i}}}{1+e^{\beta_{0}+\beta_{1}x_{i}}}+\epsilon_{i} \\

y_{i}&=\frac{\beta_{0}+\beta_{1}x_{i}}{1+\beta_{2}e^{\beta_{3}x_{i}}}+\epsilon_{i} \\

y_{i}&=\beta_{0}+(0.4-\beta_{0})e^{-\beta_{1}(x_{i}-5)}+\epsilon_{i}.

\end{align*}\)

However, there are some nonlinear models which are actually called **intrinsically linear** because they can be made linear in the parameters by a simple transformation. For example:

\(\begin{equation*}

Y=\frac{\beta_{0}X}{\beta_{1}+X}

\end{equation*}\)

can be rewritten as

\(\begin{align*}

\frac{1}{Y}&=\frac{1}{\beta_{0}}+\frac{\beta_{1}}{\beta_{0}}\frac{1}{X}\\

&=\theta_{0}+\theta_{1}\frac{1}{X},

\end{align*}\)

which is linear in the transformed parameters \(\theta_{0}\) and \(\theta_{1}\). In such cases, transforming a model to its linear form often provides better inference procedures and confidence intervals, but one must be cognizant of the effects that the transformation has on the distribution of the errors.

### Nonlinear Least Squares

Returning to cases in which it is *not* possible to transform the model to a linear form, consider the setting

\(\begin{equation*}

Y_{i}=f(\textbf{X}_{i},\beta)+\epsilon_{i},

\end{equation*}\)

where the \(\epsilon_{i}\) are *iid* normal with mean 0 and constant variance \(\sigma^{2}\). For this setting, we can rely on some of the least squares theory we have developed over the course. For other nonnormal error terms, different techniques need to be employed.

First, let

\(\begin{equation*}

Q=\sum_{i=1}^{n}(y_{i}-f(\textbf{X}_{i},\beta))^{2}.

\end{equation*}\)

In order to find

\(\begin{equation*}

\hat{\beta}=\arg\min_{\beta}Q,

\end{equation*}\)

we first find each of the partial derivatives of *Q* with respect to \(\beta_{j}\). Then, we set each of the partial derivatives equal to 0 and the parameters \(\beta_{k}\) are each replaced by \(\hat{\beta}_{k}\). The functions to be solved are nonlinear in the parameter estimates \(\hat{\beta}_{k}\) and are often difficult to solve, even in the simplest cases. Hence, iterative numerical methods are often employed. Even more difficulty arises in that multiple solutions may be possible!

Algorithms for nonlinear least squares estimation include:

**Newton's method**, a classical method based on a gradient approach but which can be computationally challenging and heavily dependent on good starting values.- The
**Gauss-Newton algorithm**, a modification of Newton's method that gives a good approximation of the solution that Newton's method should have arrived at but which is not guaranteed to converge. - The
**Levenberg-Marquardt method**, which can take care of computational difficulties arising with the other methods but can require a tedious search for the optimal value of a tuning parameter.