We have discussed the notion of ordering data (e.g., ordering the residuals). The applications we have presented with ordered data have all concerned univariate data sets. However, there are also techniques for ordering multivariate data sets. **Statistical depth functions** provide a center-outward ordering of multivariate observations, which allows one to define reasonable analogues of univariate order statistics. There are numerous depth functions, which we do not discuss here. However, the notion of statistical depth is also used in the regression setting. Specifically, there is the notion of **regression depth**, which is a quality measure for robust linear regression.

Statistically speaking, the regression depth of a hyperplane \(\mathcal{H}\) is the smallest number of residuals that need to change the sign to make \(\mathcal{H}\) a **nonfit**. This definition also has convenient statistical properties, such as invariance under affine transformations, which we do not discuss in greater detail. A regression hyperplane is called a nonfit if it can be rotated horizontally (i.e., parallel to the axis of any of the predictor variables) without passing through any data points. (We count the points exactly on the hyperplane as "passed through".) A nonfit is a very poor regression hyperplane because it is combinatorially equivalent to a horizontal hyperplane, which posits no relationship between predictor and response variables. The regression depth of a hyperplane (say, \(\mathcal{L}\)) is the minimum number of points whose removal makes \(\mathcal{H}\) into a nonfit. For example, consider the data in the figure below.

Removing the red circles and rotating the regression line until horizontal (i.e., the dashed blue line) demonstrates that the black line has regression depth 3. Hyperplanes with high regression depth behave well in general error models, including skewed or distributions with heteroscedastic errors.

The regression depth of *n* points in *p* dimensions is upper bounded by \(\lceil n/(p+1)\rceil\), where *p* is the number of variables (i.e., the number of responses plus the number of predictors). In other words, there exist point sets for which no hyperplane has regression depth larger than this bound. For the simple linear regression example in the plot above, this means there is always a line with a regression depth of at least \(\lceil n/3\rceil\).

When confronted with outliers, then you may be confronted with the choice of other regression lines or hyperplanes to consider for your data. Some of these regressions may be biased or altered from the traditional ordinary least squares line. In such cases, regression depth can help provide a measure of a fitted line that best captures the effects due to outliers.