10.3 - Orthongonal Polynomials: Fungus Example

Fungus Example Section

Consider a treatment design consisting of five humidity level percentages (30, 40, 50, 60, and 70). Each of the five treatments is assigned randomly to three climate-controlled grow rooms in a completely randomized experimental design. The resulting fungus crop yields (lbs) are shown in the table below (Fungus Data):

	30	40	50	60	70
Means (\(\bar{y}_i\))	Humidity (x)
	13.1	17.1	19.5	18.7	18.9
	12.3	16.6	21.1	20.1	17.3
	13.3	17.4	19.3	18.2	17.5
	12.90	17.03	19.97	19.00	17.90

We can see that the factor humidity has levels that are equally spaced. Therefore, we can use the orthogonal contrast coefficients to fit a polynomial to the response, fungus yields. With \(k=5\) levels, we can only fit up to a quartic term. The orthogonal polynomial contrast coefficients for the example are shown in Table 10.1. Here, \(r=3\) denotes the number of replicates, \(p=1,\dots,4\) denotes the term order, and \(i = 1,\dots,5\) denotes the factor level.

Table 10.1 - Computations for orthogonal polynomial contrasts and sums of squares
Humidity(x)	\(\bar{y}_i\)	Orthogonal Polynomial Coefficients \((g_{pi})\)
Humidity(x)	\(\bar{y}_i\)	Mean	Linear	Quadratic	Cubic	Quartic
30	12.90	1	-2	2	-1	1
40	17.03	1	-1	-1	2	-4
50	19.97	1	0	-2	0	6
60	19.00	1	1	-1	-2	-4
70	17.90	1	2	2	1	1
\(\lambda_p\)		-	1	1	5/6	35/12
Sum = \(\sum g_{pi}\bar{y}_{i.}\)		86.80	11.97	-14.37	1.07	6.47
Divisor = \(\sum g_{pi}^{2}\)		5	10	14	10	70
\(SSP_p=r(\sum g_{pi}\bar{y}_{i.})^2/\sum g_{pi}^{2}\)		-	42.96	44.23	0.34	1.79
\(\hat{\alpha}_p=\sum g_{pi}\bar{y}_{i.}/\sum g_{pi}^{2}\)		17.36	1.20	-1.03	0.11	0.09

As mentioned before, one can easily find the orthogonal polynomial coefficients \((g_{pi})\) for a different order of polynomials using pre-documented tables for equally spaced intervals. However, let us try to understand how the coefficients are obtained.

First note that the five values of \(x\) are 30, 40, 50, 60, and 70. Therefore, \(\bar{x} = 50\) and the spacing \(d = 10\). This means that the five values of \(\dfrac{x - \bar{x}}{d}\) are -2, -1, 0, 1, and 2.

Linear coefficients:

Using the linear orthogonal polynomial formula seen in Section 10.2, the polynomial \(g_{1i}\) for linear coefficients can be calculated as follows.

Linear Coefficient Polynomials
\(x\)	30	40	50	60	70
\((x-50)\)	-20	-10	0	10	20
\(\frac{(x-50)}{10}\)	-2	-1	0	1	2
Linear orthogonal polynomial	\((-2)\lambda_1\)	\((-1)\lambda_1\)	\((0)\lambda_1\)	\((1)\lambda_1\)	\((2)\lambda_1\)

To obtain the final set of coefficients we choose \(\lambda_1\) so that the coefficients are integers. Therefore, we set \(\lambda_1 = 1\) and obtain the coefficient values \((g_{1i})\) seen in the "Linear" column of Table 10.1.

Quadratic coefficients:

Using the quadratic orthogonal polynomial formula seen in Section 10.2, the polynomial \(g_{2i}\) for linear coefficients turns out to be:

\(\begin{align*}
&\left( (-2)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\left( (-1)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\\& \left( (0)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\left( (1)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\\& \left( (2)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2
\end{align*}\)

which simplified gives:

\((2)\lambda_2, (-1)\lambda_2, (-2)\lambda_2, (-1)\lambda_2, (2)\lambda_2\)

To obtain the final set of coefficients we choose \(\lambda_2\) so that the coefficients are integers. Therefore, we set \(\lambda_2 = 1\) and obtain the coefficient values \((g_{2i})\) seen in the "Quadratic" column of Table 10.1.

Cubic coefficients:

The polynomial \(g_{3i}\) for cubic coefficients turns out to be:

\(\begin{align*}
&\left( (-2)^3 - (-2) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\left( (-1)^3 - (-1) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\\& \left( (0)^3 - (0) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\left( (1)^3 - (1) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\\& \left( (2)^3 - (2) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3
\end{align*}\)

which simplified gives:

\((-\dfrac{6}{5})\lambda_3, (\dfrac{12}{5})\lambda_3, (0)\lambda_3, (\dfrac{-12}{5})\lambda_3, (\dfrac{6}{5})\lambda_3\)

To obtain the final set of coefficients we choose \(\lambda_3\) so that the coefficients are integers. Therefore, we set \(\lambda_3 = 5/6\) and obtain the coefficient values \((g_{3i})\) seen in the "Cubic" column of Table 10.1.

Quartic coefficients:

The polynomial \(g_{4i}\) can be used to obtain the quartic coefficients in the same way as above.

Each set of coefficients corresponds to a contrast among the treatments. Notice the sum of coefficients is equal to zero (e.g. the quartic coefficients \((1, -4, 6, -4, 1)\) sum to zero, etc). Using orthogonal polynomial contrasts, we can partition the treatment sums of squares into a set of additive sums of squares corresponding to orthogonal polynomial contrasts. The contrast computations are identical to those we learned in Lesson 2.5. The resulting partitions can be used to sequentially test the significance of the linear, quadratic, cubic, and quartic terms in the model and find the polynomial order appropriate for the data.

Table 10.1 shows how to obtain the sums of squares for each component \( (SSP_p) \) and compute the estimates of the \(\alpha_p\) coefficients for the orthogonal polynomial equation. Using the results in Table 10.1, we have estimated the orthogonal polynomial equation as:

\(\hat{y}_i = 17.36 + 1.20 g_{1i} - 1.03 g_{2i} + 0.11 g_{3i} + 0.09g_{4i}\)

Table 10.2 summarizes how the treatment sums of squares are partitioned and their test results.

Table 10.2 - Analysis of variance for the orthogonal polynomial model
relationship between percent humidity and fungus yield.
Source of Variation	Degrees of Freedom	Sum of Squares	Mean Square	F	Pr > F
Density	4	89.32	22.331	35.48	0.000
Error	10	6.29	0.629

Contrast	DF	Contrast SS	Mean Square	F	Pr > F
Linear	1	42.96	42.96	68.26	0.000
Quadratic	1	44.23	44.23	70.28	0.000
Cubic	1	0.34	0.34	0.54	0.478
Quartic	1	1.79	1.79	2.85	0.122

To test whether any of the polynomials are significant (i.e. \(H_0: \alpha_1 = \alpha_2 = \alpha_3 = \alpha_4 = 0\)), we can use the global F-test where the test statistic is equal to 35.48. We see that the p-value is almost zero and therefore we can conclude that at the 5% level, at least one of the polynomials is significant. Using the orthogonal polynomial contrasts we can determine which of the polynomials are useful. Table 10.2 shows that for this example only the linear and quadratic terms are useful. Removing the insignificant terms, the estimated orthogonal polynomial equation becomes:

\(\hat{y}_i = 17.36 + 1.20 g_{1i} - 1.03 g_{2i}\)

The polynomial relationship expressed as a function of y and x in actual units of the observed variables is more informative than when expressed in units of the orthogonal polynomial.

We can obtain the polynomial relationship using the actual units of observed variables by back-transforming using the formulas seen in Section 10.2. The necessary quantities to back-transform are \(\lambda_1=\lambda_2=1, d=10, \bar{x}=50\) and t = 5. Substituting these values we obtain

\(\begin{align}
\hat{y}&=17.36+1.20g_{1i}-1.03g_{2i}\\
&=17.36+1.20(1)\left( \dfrac{x-50}{10} \right)-1.03(1)\left( \left( \dfrac{x-50}{10} \right)^2-\dfrac{5^2-1}{12} \right)\\
\end{align}\)

which simplifies to

\(\hat{y}=-12.23+1.15x-0.01x^2\)

Below is the code for generating polynomials from the IML procedure in SAS:

/* read the fungus data set*/
/* Generating Ortho_Polynomials from IML */
proc iml;
  x={30 40 50 60 70}; 
  xpoly=orpol(x,4); /* the '4' is the df for the quantitative factor */
  humidity=x`; new=humidity || xpoly;
  create out1 from new[colname={"humidity" "xp0" "xp1" "xp2" "xp3" "xp4"}];
  append from new; close out1; 
quit;
proc print data=out1; 
run;

/* Here data is sorted and then merged with the original dataset */
proc sort data=fungal; 
  by humidity; 
run;
data ortho_poly; merge out1 fungal; 
  by humidity; 
run;
proc print data=ortho_poly; 
run;

/* The following code will then generate the results shown in the
	notes*/
proc mixed data=ortho_poly method=type1;
  class;
  model yield=xp1 xp2 xp3 xp4/ solution;
  title 'Using Orthog polynomials from IML';
  ods select SolutionF;
run;
 
/*We can use proc glm to obtain the same results without using 
   IML codings. Proc glm will use the orthogonal contrast coefficients directly*/
proc glm data=fungal;
 class humidity;
 model yield = humidity;
 contrast 'linear' humidity -2 -1 0 1 2;
 contrast 'quadratic' humidity 2 -1 -2 -1 2;
 contrast 'cubic' humidity -1 2 0 -2 1;
 contrast 'quatic' humidity 1 -4 6 -4 1;
 lsmeans humidity;
run;

Analysis of Variance
Source	DF	Sum of Squares	Mean Square	ExpectedMean Square	Error Term	Error DF	F Value	Pr > F
xp1	1	43.200000	43.200000	Var(Residual) + Q(xp1)	MS(Residual)	10	57.75	< .0001
xp2	1	42.000000	42.000000	Var(Residual) + Q(xp2)	MS(Residual)	10	56.15	< .0001
xp3	1	0.300000	0.300000	Var(Residual) + Q(xp2)	MS(Residual)	10	0.40	0.5407
xp4	1	2.100000	2.100000	Var(Residual) + Q(xp4)	MS(Residual)	10	2.81	0.1248
Residual	10	7.480000	7.480000	Var(Residual)

Fitting a Quadratic Model with Proc Mixed

Often we can see that only a quadratic curvature is of interest in a set of data. In this case, we can plan to simply run an order 2 (quadratic) polynomial and can easily use proc mixed (the general linear model). This method requires centering the quantitative variable levels by subtracting the mean of the levels (50) and then creating the quadratic polynomial terms.

data fungal; 
  set fungal;
  x=humidity-50;
  x2=x**2;
run;
proc mixed data=fungal method=type1;
  class; 
  model yield = x x2;
run;

The output is:

Type 1 Analysis of Variance
Source	DF	Sum of Squares	Mean Square	Expected Mean Square	Error Term	Error DF	F Value	Pr > F
x	1	42.960333	42.960333	Var(Residual) + Q(x)	MS(residual)	12	61.18	<.0001
x2	1	44.228810	44.228810	Var(Residual) + Q(x2)	MS(residual)	12	62.98	<.0001
Residual	12	8.426857	0.702238	Var(Residual)

We can also generate the solutions (coefficients) for the model with the following code

proc mixed data=fungal method=type1;
  class;
  model yield = x x2 / solution;
run;

which gives the following output for the regression coefficients.

Solution for Fixed Effects
Effect	Estimate	Standard Error	DF	t Value	Pr > \|t\|
Intercept	19.4124	0.3372	12	57.57	<.0001
x	0.1197	0.01530	12	7.82	<.0001
x2	-0.01026	0.001293	12	-7.94	<.0001

Here we need to remember that the regression was based on centered values for the predictor, so we have to back-transform to get the coefficients in terms of the original variables. For example, the fitted second-order model for one predictor variable that is expressed in terms of centered values \(x = X-\bar{X}\):

\(\hat{Y}=b_0+b_1x+b_{2}x^2\)

becomes in terms of the original X variable:

\(\hat{Y}=b'_0+b'_1X+b'_{2}X^2\)

where:

\(\begin{align}
b'_0&=b_0-b_1\bar{X}+b_{2}\bar{X}^2\\
b'_1&=b_1-2b_{2}\bar{X}\\
b'_{2}&=b_{2}\\
\end{align}\)

In the example above, this back-transformation uses the estimates from the Solutions for Fixed Effects table as follows.

data backtransform;
  bprime0=19.4124-(0.1197*50)+(-0.01026*(50**2));
  bprime1=0.1197-(2*-0.01026*50);
  bprime2=-0.01026;
  title 'bprime0=b0-(b1*meanX)+(b2*(meanX)2)';
  title2 'bprime1=b1-2*b2*meanX';
  title3 'bprime2=b2';
run;
proc print data=backtransform; 
 var bprime0 bprime1 bprime2; 
run;

The output is then:

Obs	bprime0	bprime1	bprime2
1	-12.2226	1.1457	-0.01026

Note! The ANOVA results and the final quadratic regression equation here are identical to the results from the orthogonal polynomial coding approach.

First, load and attach the Fungus data.

setwd("~/path-to-folder/")
fungal_data <- read.table("fungus_data.txt",header=T)
attach(fungal_data)

R has a built-in function for generating orthogonal polynomials which we utilize to obtain the Type 1 ANOVA table.

poly_lm = lm(yield ~ poly(humidity,4))
poly_aov1 = aov(poly_lm)
summary(poly_aov1)

                 Df Sum Sq Mean Sq F value   Pr(>F)    
poly(humidity, 4)  4  89.32  22.331   35.48  7e-06 ***
Residuals         10   6.29   0.629                              
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

It is clear from the F-test that at least one of the polynomial terms is significant. We can examine the t-tests of the individual terms using the summary function.

summary(poly_lm)

Call:
lm(formula = yield ~ poly(humidity, 4))

Residuals:
    Min      1Q  Median      3Q     Max 
-0.8000 -0.5333 -0.3000  0.3833  1.1333

Coefficients:
                   Estimate Std. Error t value Pr(>|t|)     
(Intercept)         17.3600     0.2048  84.753 1.28e-15 ***
poly(humidity, 4)1   6.5544     0.7933   8.262 8.87e-06 ***
poly(humidity, 4)2  -6.6505     0.7933  -8.383 7.80e-06 ***
poly(humidity, 4)3   0.5842     0.7933   0.736    0.478       
poly(humidity, 4)4   1.3387     0.7933   1.688    0.122   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 0.7933 on 10 degrees of freedom
Multiple R-squared:  0.9342,    Adjusted R-squared:  0.9079 
F-statistic: 35.48 on 4 and 10 DF,  p-value: 7.005e-06

As is commonly found, only the quadratic curvature is of interest. We can then fit a standard quadratic model, first recentering the response as seen in previous sections.

humidity_center = humidity-50
humidity_center2 = humidity_center^2
new_data = data.frame(fungal_data,humidity_center,humidity_center2)
detach(fungal_data); attach(new_data)

quad_lm = lm(yield ~ humidity_center + humidity_center2)
aov1_quad = aov(quad_lm)
summary(aov1_quad)

                 Df Sum Sq Mean Sq F value   Pr(>F)    
humidity_center   1  42.96   42.96   61.18 4.73e-06 ***
humidity_center2  1  44.23   44.23   62.98 4.08e-06 ***
Residuals        12   8.43    0.70                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The following coefficients are produced.

quad_lm$coefficients

     (Intercept)  humidity_center humidity_center2 
      19.4123810        0.1196667       -0.0102619

To obtain the coefficients in terms of the original variables, we can back-transform these coefficients (output not shown).

b_0_prime = 19.4123810-0.1196667*50-0.0102619*50^2
b_0_prime
b_1_prime = 0.1196667-0.0102619*(-2*50)
b_1_prime
b_2_prime = -0.0102619
b_2_prime