10.3 - Orthongonal Polynomials: Fungus Example

Fungus Example Section

Consider a treatment design consisting of five humidity level percentages (30, 40, 50, 60, and 70). Each of the five treatments is assigned randomly to three climate-controlled grow rooms in a completely randomized experimental design. The resulting fungus crop yields (lbs) are shown in the table below (Fungus Data):

Means (\(\bar{y}_i\))

Humidity (x)

30

40

50

60

70

13.1

17.1

19.5

18.7

18.9

12.3

16.6

21.1

20.1

17.3

13.3

17.4

19.3

18.2

17.5

12.90

17.03

19.97

19.00

17.90

We can see that the factor humidity has levels that are equally spaced. Therefore, we can use the orthogonal contrast coefficients to fit a polynomial to the response, fungus yields.  With \(k=5\) levels, we can only fit up to a quartic term. The orthogonal polynomial contrast coefficients for the example are shown in Table 10.1. Here, \(r=3\) denotes the number of replicates, \(p=1,\dots,4\) denotes the term order, and \(i = 1,\dots,5\) denotes the factor level. 

Table 10.1 - Computations for orthogonal polynomial contrasts and sums of squares

Humidity(x)

\(\bar{y}_i\)

Orthogonal Polynomial Coefficients \((g_{pi})\)

Mean

Linear

Quadratic

Cubic

Quartic

30

12.90

1

-2

2

-1

1

40

17.03

1

-1

-1

2

-4

50

19.97

1

0

-2

0

6

60

19.00

1

1

-1

-2

-4

70

17.90

1

2

2

1

1

\(\lambda_p\)

-

1

1

5/6

35/12

Sum = \(\sum g_{pi}\bar{y}_{i.}\)

86.80

11.97

-14.37

1.07

6.47

Divisor = \(\sum g_{pi}^{2}\)

5

10

14

10

70

\(SSP_p=r(\sum g_{pi}\bar{y}_{i.})^2/\sum g_{pi}^{2}\)

-

42.96

44.23

0.34

1.79

\(\hat{\alpha}_p=\sum g_{pi}\bar{y}_{i.}/\sum g_{pi}^{2}\)

17.36

1.20

-1.03

0.11

0.09

 

As mentioned before, one can easily find the orthogonal polynomial coefficients \((g_{pi})\) for a different order of polynomials using pre-documented tables for equally spaced intervals. However, let us try to understand how the coefficients are obtained. 

First note that the five values of \(x\) are 30, 40, 50, 60, and 70. Therefore, \(\bar{x} = 50\) and the spacing \(d = 10\). This means that the five values of \(\dfrac{x - \bar{x}}{d}\) are -2, -1, 0, 1, and 2.


Linear coefficients:

Using the linear orthogonal polynomial formula seen in Section 10.2, the polynomial \(g_{1i}\) for linear coefficients can be calculated as follows.

Linear Coefficient Polynomials

\(x\)

30

40

50

60

70

\((x-50)\)

-20

-10

0

10

20

\(\frac{(x-50)}{10}\)

-2

-1

0

1

2

Linear orthogonal polynomial

\((-2)\lambda_1\)

\((-1)\lambda_1\)

\((0)\lambda_1\)

\((1)\lambda_1\)

\((2)\lambda_1\)

To obtain the final set of coefficients we choose \(\lambda_1\) so that the coefficients are integers. Therefore, we set \(\lambda_1 = 1\) and obtain the coefficient values \((g_{1i})\) seen in the "Linear" column of Table 10.1.


Quadratic coefficients:

Using the quadratic orthogonal polynomial formula seen in Section 10.2, the polynomial \(g_{2i}\) for linear coefficients turns out to be:

\(\begin{align*}  
&\left( (-2)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,  
\left( (-1)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,  
\\& \left( (0)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,  
\left( (1)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,  
\\& \left( (2)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2  
\end{align*}\)

which simplified gives:

\((2)\lambda_2, (-1)\lambda_2, (-2)\lambda_2, (-1)\lambda_2, (2)\lambda_2\)

To obtain the final set of coefficients we choose \(\lambda_2\) so that the coefficients are integers. Therefore, we set \(\lambda_2 = 1\) and obtain the coefficient values \((g_{2i})\) seen in the "Quadratic" column of Table 10.1.


Cubic coefficients:

The polynomial \(g_{3i}\) for cubic coefficients turns out to be:

\(\begin{align*}  
&\left( (-2)^3 - (-2) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,  
\left( (-1)^3 - (-1) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,  
\\& \left( (0)^3 - (0) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,  
\left( (1)^3 - (1) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,  
\\& \left( (2)^3 - (2) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3  
\end{align*}\)

which simplified gives:

\((-\dfrac{6}{5})\lambda_3, (\dfrac{12}{5})\lambda_3, (0)\lambda_3, (\dfrac{-12}{5})\lambda_3, (\dfrac{6}{5})\lambda_3\)

To obtain the final set of coefficients we choose \(\lambda_3\) so that the coefficients are integers. Therefore, we set \(\lambda_3 = 5/6\) and obtain the coefficient values \((g_{3i})\) seen in the "Cubic" column of Table 10.1.


Quartic coefficients:

The polynomial \(g_{4i}\) can be used to obtain the quartic coefficients in the same way as above.

Each set of coefficients corresponds to a contrast among the treatments. Notice the sum of coefficients is equal to zero (e.g. the quartic coefficients \((1, -4, 6, -4, 1)\) sum to zero, etc). Using orthogonal polynomial contrasts, we can partition the treatment sums of squares into a set of additive sums of squares corresponding to orthogonal polynomial contrasts. The contrast computations are identical to those we learned in Lesson 2.5. The resulting partitions can be used to sequentially test the significance of the linear, quadratic, cubic, and quartic terms in the model and find the polynomial order appropriate for the data. 

Table 10.1 shows how to obtain the sums of squares for each component \( (SSP_p) \) and compute the estimates of the \(\alpha_p\) coefficients for the orthogonal polynomial equation. Using the results in Table 10.1, we have estimated the orthogonal polynomial equation as:

\(\hat{y}_i = 17.36 + 1.20 g_{1i} - 1.03 g_{2i} + 0.11 g_{3i} + 0.09g_{4i}\)

Table 10.2 summarizes how the treatment sums of squares are partitioned and their test results. 

Table 10.2 - Analysis of variance for the orthogonal polynomial model  
relationship between percent humidity and fungus yield.

Source of Variation

Degrees of Freedom

Sum of Squares

Mean Square

F

Pr > F

Density

4

89.32

22.331

35.48

0.000

Error

10

6.29

0.629

  

Contrast

DF

Contrast SS

Mean Square

F

Pr > F

Linear

1

42.96

42.96

68.26

0.000

Quadratic

1

44.23

44.23

70.28

0.000

Cubic

1

0.34

0.34

0.54

0.478

Quartic

1

1.79

1.79

2.85

0.122

To test whether any of the polynomials are significant (i.e. \(H_0: \alpha_1 = \alpha_2 = \alpha_3 = \alpha_4 = 0\)), we can use the global F-test where the test statistic is equal to 35.48. We see that the p-value is almost zero and therefore we can conclude that at the 5% level, at least one of the polynomials is significant. Using the orthogonal polynomial contrasts we can determine which of the polynomials are useful. Table 10.2 shows that for this example only the linear and quadratic terms are useful. Removing the insignificant terms, the estimated orthogonal polynomial equation becomes:

\(\hat{y}_i = 17.36 + 1.20 g_{1i} - 1.03 g_{2i}\)

The polynomial relationship expressed as a function of y and x in actual units of the observed variables is more informative than when expressed in units of the orthogonal polynomial.

We can obtain the polynomial relationship using the actual units of observed variables by back-transforming using the formulas seen in Section 10.2. The necessary quantities to back-transform are \(\lambda_1=\lambda_2=1, d=10, \bar{x}=50\) and t = 5. Substituting these values we obtain

\(\begin{align}  
\hat{y}&=17.36+1.20g_{1i}-1.03g_{2i}\\  
&=17.36+1.20(1)\left( \dfrac{x-50}{10} \right)-1.03(1)\left( \left( \dfrac{x-50}{10} \right)^2-\dfrac{5^2-1}{12} \right)\\  
\end{align}\)

which simplifies to

\(\hat{y}=-12.23+1.15x-0.01x^2\)

Using Technology

Below is the code for generating polynomials from the IML procedure in SAS:

/* read the fungus data set*/
/* Generating Ortho_Polynomials from IML */
proc iml;
  x={30 40 50 60 70}; 
  xpoly=orpol(x,4); /* the '4' is the df for the quantitative factor */
  humidity=x`; new=humidity || xpoly;
  create out1 from new[colname={"humidity" "xp0" "xp1" "xp2" "xp3" "xp4"}];
  append from new; close out1; 
quit;
proc print data=out1; 
run;

/* Here data is sorted and then merged with the original dataset */
proc sort data=fungal; 
  by humidity; 
run;
data ortho_poly; merge out1 fungal; 
  by humidity; 
run;
proc print data=ortho_poly; 
run;

/* The following code will then generate the results shown in the
	notes*/
proc mixed data=ortho_poly method=type1;
  class;
  model yield=xp1 xp2 xp3 xp4/ solution;
  title 'Using Orthog polynomials from IML';
  ods select SolutionF;
run;
 
/*We can use proc glm to obtain the same results without using 
   IML codings. Proc glm will use the orthogonal contrast coefficients directly*/
proc glm data=fungal;
 class humidity;
 model yield = humidity;
 contrast 'linear' humidity -2 -1 0 1 2;
 contrast 'quadratic' humidity 2 -1 -2 -1 2;
 contrast 'cubic' humidity -1 2 0 -2 1;
 contrast 'quatic' humidity 1 -4 6 -4 1;
 lsmeans humidity;
run;                    

Analysis of Variance

Source

DF

Sum of Squares

Mean Square

ExpectedMean Square

Error Term

Error DF

F Value

Pr > F

xp1

1

43.200000

43.200000

Var(Residual) + Q(xp1)

MS(Residual)

10

57.75

< .0001

xp2

1

42.000000

42.000000

Var(Residual) + Q(xp2)

MS(Residual)

10

56.15

< .0001

xp3

1

0.300000

0.300000

Var(Residual) + Q(xp2)

MS(Residual)

10

0.40

0.5407

xp4

1

2.100000

2.100000

Var(Residual) + Q(xp4)

MS(Residual)

10

2.81

0.1248

Residual

10

7.480000

7.480000

Var(Residual)

    

Fitting a Quadratic Model with Proc Mixed

Often we can see that only a quadratic curvature is of interest in a set of data. In this case, we can plan to simply run an order 2 (quadratic) polynomial and can easily use proc mixed (the general linear model). This method requires centering the quantitative variable levels by subtracting the mean of the levels (50) and then creating the quadratic polynomial terms.

data fungal; 
  set fungal;
  x=humidity-50;
  x2=x**2;
run;
proc mixed data=fungal method=type1;
  class; 
  model yield = x x2;
run;

The output is:

Type 1 Analysis of Variance

Source

DF

Sum of Squares

Mean Square

Expected Mean Square

Error Term

Error DF

F Value

Pr > F

x

1

42.960333

42.960333

Var(Residual) + Q(x)

MS(residual)

12

61.18

<.0001

x2

1

44.228810

44.228810

Var(Residual) + Q(x2)

MS(residual)

12

62.98

<.0001

Residual

12

8.426857

0.702238

Var(Residual)

    

We can also generate the solutions (coefficients) for the model with the following code

proc mixed data=fungal method=type1;
  class;
  model yield = x x2 / solution;
run;

which gives the following output for the regression coefficients.

Solution for Fixed Effects

Effect

Estimate

Standard Error

DF

t Value

Pr > |t|

Intercept

19.4124

0.3372

12

57.57

<.0001

x

0.1197

0.01530

12

7.82

<.0001

x2

-0.01026

0.001293

12

-7.94

<.0001

Here we need to remember that the regression was based on centered values for the predictor, so we have to back-transform to get the coefficients in terms of the original variables. For example, the fitted second-order model for one predictor variable that is expressed in terms of centered values \(x = X-\bar{X}\):

\(\hat{Y}=b_0+b_1x+b_{2}x^2\)

becomes in terms of the original X variable:

\(\hat{Y}=b'_0+b'_1X+b'_{2}X^2\)

where:

\(\begin{align} 
b'_0&=b_0-b_1\bar{X}+b_{2}\bar{X}^2\\ 
b'_1&=b_1-2b_{2}\bar{X}\\ 
b'_{2}&=b_{2}\\ 
\end{align}\)

In the example above, this back-transformation uses the estimates from the Solutions for Fixed Effects table as follows.

data backtransform;
  bprime0=19.4124-(0.1197*50)+(-0.01026*(50**2));
  bprime1=0.1197-(2*-0.01026*50);
  bprime2=-0.01026;
  title 'bprime0=b0-(b1*meanX)+(b2*(meanX)2)';
  title2 'bprime1=b1-2*b2*meanX';
  title3 'bprime2=b2';
run;
proc print data=backtransform; 
 var bprime0 bprime1 bprime2; 
run;               

The output is then:

Obs

bprime0

bprime1

bprime2

1

-12.2226

1.1457

-0.01026

Note! The ANOVA results and the final quadratic regression equation here are identical to the results from the orthogonal polynomial coding approach.

First, load and attach the Fungus data.

setwd("~/path-to-folder/")
fungal_data <- read.table("fungus_data.txt",header=T)
attach(fungal_data)

R has a built-in function for generating orthogonal polynomials which we utilize to obtain the Type 1 ANOVA table.

poly_lm = lm(yield ~ poly(humidity,4))
poly_aov1 = aov(poly_lm)
summary(poly_aov1)
                 Df Sum Sq Mean Sq F value   Pr(>F)    
poly(humidity, 4)  4  89.32  22.331   35.48  7e-06 ***
Residuals         10   6.29   0.629                              
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

It is clear from the F-test that at least one of the polynomial terms is significant. We can examine the t-tests of the individual terms using the summary function.

summary(poly_lm)
Call:
lm(formula = yield ~ poly(humidity, 4))
Residuals:
    Min      1Q  Median      3Q     Max 
-0.8000 -0.5333 -0.3000  0.3833  1.1333 
Coefficients:
                   Estimate Std. Error t value Pr(>|t|)     
(Intercept)         17.3600     0.2048  84.753 1.28e-15 ***
poly(humidity, 4)1   6.5544     0.7933   8.262 8.87e-06 ***
poly(humidity, 4)2  -6.6505     0.7933  -8.383 7.80e-06 ***
poly(humidity, 4)3   0.5842     0.7933   0.736    0.478       
poly(humidity, 4)4   1.3387     0.7933   1.688    0.122   
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7933 on 10 degrees of freedom
Multiple R-squared:  0.9342,    Adjusted R-squared:  0.9079 
F-statistic: 35.48 on 4 and 10 DF,  p-value: 7.005e-06

As is commonly found, only the quadratic curvature is of interest. We can then fit a standard quadratic model, first recentering the response as seen in previous sections.

humidity_center = humidity-50
humidity_center2 = humidity_center^2
new_data = data.frame(fungal_data,humidity_center,humidity_center2)
detach(fungal_data); attach(new_data)

quad_lm = lm(yield ~ humidity_center + humidity_center2)
aov1_quad = aov(quad_lm)
summary(aov1_quad)
                 Df Sum Sq Mean Sq F value   Pr(>F)    
humidity_center   1  42.96   42.96   61.18 4.73e-06 ***
humidity_center2  1  44.23   44.23   62.98 4.08e-06 ***
Residuals        12   8.43    0.70                         
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

The following coefficients are produced.

quad_lm$coefficients
     (Intercept)  humidity_center humidity_center2 
      19.4123810        0.1196667       -0.0102619 

To obtain the coefficients in terms of the original variables, we can back-transform these coefficients (output not shown).

b_0_prime = 19.4123810-0.1196667*50-0.0102619*50^2
b_0_prime
b_1_prime = 0.1196667-0.0102619*(-2*50)
b_1_prime
b_2_prime = -0.0102619
b_2_prime