# 7.5 - Further Transformation Advice and Box-Cox

Transformations of the variables are used in regression to describe curvature and sometimes are also used to adjust for nonconstant variance in the errors (and *y*-variable).

#### What to Try?

When there is curvature in the data, there might possibly be some theory in the literature of the subject matter to suggests an appropriate equation. Or, you might have to use trial and error data exploration to determine a model that fits the data. In the trial and error approach, you might try polynomial models or transformations of the *x*-variable(s) and/or *y*-variable such as square root, logarithmic, or reciprocal transformations. One of these will often end up working out.

#### Transform predictors or response?

In the data exploration approach, remember the following: If you transform the *y*-variable, you will change the variance of the *y*-variable and the errors. You may wish to try transformations of the *y*-variable (e.g., \(\ln(y)\), \(\sqrt{y}\), \(y^{-1}\)) when there is evidence of nonnormality and/or nonconstant variance problems in one or more residual plots. Try transformations of the *x*-variable(s) (e.g., \(x^{-1}\), \(x^{2}\), \(\ln(x)\)) when there are strong nonlinear trends in one or more residual plots. However, these guidelines don't always work. Sometimes it will be just as much art as it is science!

#### Why Might Logarithms Work?

If you know about the algebra of logarithms, you can verify the relationships in this section. If you don't know about the algebra of logarithms, take a leap of faith by accepting what is here as truth.

Logarithms are often used because they are connected to common exponential growth and power curve relationships. Note that here *log* refers to the natural logarithm. In Statistics, *log* and *ln* are used interchangeably. If any other base is ever used, then the appropriate subscript will be used (e.g., \(\log_{10}\)). The **exponential growth equation** for variables *y* and *x* may be written as

\[\begin{equation*} y=a\times e^{bx}, \end{equation*}\]

where *a* and *b* are parameters to be estimated. Taking natural logarithms on both sides of the exponential growth equation gives

\[\begin{equation*} \log(y)= \log(a)+bx. \end{equation*}\]

Thus, an equivalent way to express exponential growth is that the logarithm of *y* is a straight-line function of *x*.

A general **power curve** equation is

\[\begin{equation*} y=a\times x^{b}, \end{equation*}\]

where again *a* and *b* are parameters to be estimated. Taking logarithms on both sides of the power curve equation gives

\[\begin{equation*} \log(y)=\log(a)+b\log(x). \end{equation*}\]

Thus an equivalent way to write a power curve equation is that the logarithm of *y* is a straight-line function of the logarithm of *x*. This regression equation is sometimes referred to as a **log-log regression equation**.

#### Box-Cox Transformation

It is often difficult to determine which transformation on *Y* to use. **Box-Cox transformations** are a family of power transformations on *Y* such that \(Y'=Y^{\lambda}\), where \(\lambda\) is a parameter to be determined using the data. The normal error regression model with a Box-Cox transformation is

\[\begin{equation*} Y_{i}^{\lambda}=\beta_{0}+\beta _{1}X_{i}+\epsilon_{i}. \end{equation*}\]

The estimation method of maximum likelihood can be used to estimate \(\lambda\) or a simple search over a range of candidate values may be performed (e.g., \(\lambda=-4.0,-3.5,-3.0,\ldots,3.0,3.5,4.0\)). For each \(\lambda\) value, the \(Y_{i}^{\lambda}\) observations are standardized so that the analysis using the SSEs does not depend on \(\lambda\). The standardization is

\[\begin{equation*} W_{i}=\left\{\begin{array}{ll} K_{1}(Y_{i}^{\lambda}-1), & \hbox{$\lambda\neq 0$;} \\ K_{2}(\log Y_{i}), & \hbox{$\lambda=0$,} \end{array} \right. \end{equation*}\]

where \(K_{2}=\prod_{i=1}^{n}Y_{i}^{1/n}\) and \(K_{1 }=\frac{1}{\lambda}K_{2}^{\lambda-1}\). Once the \(W_{i}\) have been calculated for a given \(\lambda\), then they are regressed on the \(X_{i}\) and the SSE is retained. Then the maximum likelihood estimate \(\hat{\lambda}\) is that value of \(\lambda\) for which the SSE is a minimum.