Fungus Example Section
Consider a treatment design consisting of five humidity level percentages (30, 40, 50, 60, and 70). Each of the five treatments is assigned randomly to three climate-controlled grow rooms in a completely randomized experimental design. The resulting fungus crop yields (lbs) are shown in the table below (Fungus Data):
Means (\(\bar{y}_i\)) | Humidity (x) | ||||
---|---|---|---|---|---|
30 | 40 | 50 | 60 | 70 | |
13.1 | 17.1 | 19.5 | 18.7 | 18.9 | |
12.3 | 16.6 | 21.1 | 20.1 | 17.3 | |
13.3 | 17.4 | 19.3 | 18.2 | 17.5 | |
12.90 | 17.03 | 19.97 | 19.00 | 17.90 |
We can see that the factor humidity has levels that are equally spaced. Therefore, we can use the orthogonal contrast coefficients to fit a polynomial to the response, fungus yields. With \(k=5\) levels, we can only fit up to a quartic term. The orthogonal polynomial contrast coefficients for the example are shown in Table 10.1. Here, \(r=3\) denotes the number of replicates, \(p=1,\dots,4\) denotes the term order, and \(i = 1,\dots,5\) denotes the factor level.
Humidity(x) | \(\bar{y}_i\) | Orthogonal Polynomial Coefficients \((g_{pi})\) | ||||
---|---|---|---|---|---|---|
Mean | Linear | Quadratic | Cubic | Quartic | ||
30 | 12.90 | 1 | -2 | 2 | -1 | 1 |
40 | 17.03 | 1 | -1 | -1 | 2 | -4 |
50 | 19.97 | 1 | 0 | -2 | 0 | 6 |
60 | 19.00 | 1 | 1 | -1 | -2 | -4 |
70 | 17.90 | 1 | 2 | 2 | 1 | 1 |
\(\lambda_p\) | - | 1 | 1 | 5/6 | 35/12 | |
Sum = \(\sum g_{pi}\bar{y}_{i.}\) | 86.80 | 11.97 | -14.37 | 1.07 | 6.47 | |
Divisor = \(\sum g_{pi}^{2}\) | 5 | 10 | 14 | 10 | 70 | |
\(SSP_p=r(\sum g_{pi}\bar{y}_{i.})^2/\sum g_{pi}^{2}\) | - | 42.96 | 44.23 | 0.34 | 1.79 | |
\(\hat{\alpha}_p=\sum g_{pi}\bar{y}_{i.}/\sum g_{pi}^{2}\) | 17.36 | 1.20 | -1.03 | 0.11 | 0.09 |
As mentioned before, one can easily find the orthogonal polynomial coefficients \((g_{pi})\) for a different order of polynomials using pre-documented tables for equally spaced intervals. However, let us try to understand how the coefficients are obtained.
First note that the five values of \(x\) are 30, 40, 50, 60, and 70. Therefore, \(\bar{x} = 50\) and the spacing \(d = 10\). This means that the five values of \(\dfrac{x - \bar{x}}{d}\) are -2, -1, 0, 1, and 2.
Linear coefficients:
Using the linear orthogonal polynomial formula seen in Section 10.2, the polynomial \(g_{1i}\) for linear coefficients can be calculated as follows.
Linear Coefficient Polynomials | |||||
---|---|---|---|---|---|
\(x\) | 30 | 40 | 50 | 60 | 70 |
\((x-50)\) | -20 | -10 | 0 | 10 | 20 |
\(\frac{(x-50)}{10}\) | -2 | -1 | 0 | 1 | 2 |
Linear orthogonal polynomial | \((-2)\lambda_1\) | \((-1)\lambda_1\) | \((0)\lambda_1\) | \((1)\lambda_1\) | \((2)\lambda_1\) |
To obtain the final set of coefficients we choose \(\lambda_1\) so that the coefficients are integers. Therefore, we set \(\lambda_1 = 1\) and obtain the coefficient values \((g_{1i})\) seen in the "Linear" column of Table 10.1.
Quadratic coefficients:
Using the quadratic orthogonal polynomial formula seen in Section 10.2, the polynomial \(g_{2i}\) for linear coefficients turns out to be:
\(\begin{align*}
&\left( (-2)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\left( (-1)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\\& \left( (0)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\left( (1)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2,
\\& \left( (2)^2 - \left( \dfrac{5^2 - 1}{12}\right)\right)\lambda_2
\end{align*}\)
which simplified gives:
\((2)\lambda_2, (-1)\lambda_2, (-2)\lambda_2, (-1)\lambda_2, (2)\lambda_2\)
To obtain the final set of coefficients we choose \(\lambda_2\) so that the coefficients are integers. Therefore, we set \(\lambda_2 = 1\) and obtain the coefficient values \((g_{2i})\) seen in the "Quadratic" column of Table 10.1.
Cubic coefficients:
The polynomial \(g_{3i}\) for cubic coefficients turns out to be:
\(\begin{align*}
&\left( (-2)^3 - (-2) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\left( (-1)^3 - (-1) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\\& \left( (0)^3 - (0) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\left( (1)^3 - (1) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3,
\\& \left( (2)^3 - (2) \left( \dfrac{3(5^2) - 7}{20} \right)\right)\lambda_3
\end{align*}\)
which simplified gives:
\((-\dfrac{6}{5})\lambda_3, (\dfrac{12}{5})\lambda_3, (0)\lambda_3, (\dfrac{-12}{5})\lambda_3, (\dfrac{6}{5})\lambda_3\)
To obtain the final set of coefficients we choose \(\lambda_3\) so that the coefficients are integers. Therefore, we set \(\lambda_3 = 5/6\) and obtain the coefficient values \((g_{3i})\) seen in the "Cubic" column of Table 10.1.
Quartic coefficients:
The polynomial \(g_{4i}\) can be used to obtain the quartic coefficients in the same way as above.
Each set of coefficients corresponds to a contrast among the treatments. Notice the sum of coefficients is equal to zero (e.g. the quartic coefficients \((1, -4, 6, -4, 1)\) sum to zero, etc). Using orthogonal polynomial contrasts, we can partition the treatment sums of squares into a set of additive sums of squares corresponding to orthogonal polynomial contrasts. The contrast computations are identical to those we learned in Lesson 2.5. The resulting partitions can be used to sequentially test the significance of the linear, quadratic, cubic, and quartic terms in the model and find the polynomial order appropriate for the data.
Table 10.1 shows how to obtain the sums of squares for each component \( (SSP_p) \) and compute the estimates of the \(\alpha_p\) coefficients for the orthogonal polynomial equation. Using the results in Table 10.1, we have estimated the orthogonal polynomial equation as:
\(\hat{y}_i = 17.36 + 1.20 g_{1i} - 1.03 g_{2i} + 0.11 g_{3i} + 0.09g_{4i}\)
Table 10.2 summarizes how the treatment sums of squares are partitioned and their test results.
Source of Variation | Degrees of Freedom | Sum of Squares | Mean Square | F | Pr > F |
---|---|---|---|---|---|
Density | 4 | 89.32 | 22.331 | 35.48 | 0.000 |
Error | 10 | 6.29 | 0.629 |
Contrast | DF | Contrast SS | Mean Square | F | Pr > F |
---|---|---|---|---|---|
Linear | 1 | 42.96 | 42.96 | 68.26 | 0.000 |
Quadratic | 1 | 44.23 | 44.23 | 70.28 | 0.000 |
Cubic | 1 | 0.34 | 0.34 | 0.54 | 0.478 |
Quartic | 1 | 1.79 | 1.79 | 2.85 | 0.122 |
To test whether any of the polynomials are significant (i.e. \(H_0: \alpha_1 = \alpha_2 = \alpha_3 = \alpha_4 = 0\)), we can use the global F-test where the test statistic is equal to 35.48. We see that the p-value is almost zero and therefore we can conclude that at the 5% level, at least one of the polynomials is significant. Using the orthogonal polynomial contrasts we can determine which of the polynomials are useful. Table 10.2 shows that for this example only the linear and quadratic terms are useful. Removing the insignificant terms, the estimated orthogonal polynomial equation becomes:
\(\hat{y}_i = 17.36 + 1.20 g_{1i} - 1.03 g_{2i}\)
The polynomial relationship expressed as a function of y and x in actual units of the observed variables is more informative than when expressed in units of the orthogonal polynomial.
We can obtain the polynomial relationship using the actual units of observed variables by back-transforming using the formulas seen in Section 10.2. The necessary quantities to back-transform are \(\lambda_1=\lambda_2=1, d=10, \bar{x}=50\) and t = 5. Substituting these values we obtain
\(\begin{align}
\hat{y}&=17.36+1.20g_{1i}-1.03g_{2i}\\
&=17.36+1.20(1)\left( \dfrac{x-50}{10} \right)-1.03(1)\left( \left( \dfrac{x-50}{10} \right)^2-\dfrac{5^2-1}{12} \right)\\
\end{align}\)
which simplifies to
\(\hat{y}=-12.23+1.15x-0.01x^2\)
Using Technology
Below is the code for generating polynomials from the IML procedure in SAS:
/* read the fungus data set*/
/* Generating Ortho_Polynomials from IML */
proc iml;
x={30 40 50 60 70};
xpoly=orpol(x,4); /* the '4' is the df for the quantitative factor */
humidity=x`; new=humidity || xpoly;
create out1 from new[colname={"humidity" "xp0" "xp1" "xp2" "xp3" "xp4"}];
append from new; close out1;
quit;
proc print data=out1;
run;
/* Here data is sorted and then merged with the original dataset */
proc sort data=fungal;
by humidity;
run;
data ortho_poly; merge out1 fungal;
by humidity;
run;
proc print data=ortho_poly;
run;
/* The following code will then generate the results shown in the
notes*/
proc mixed data=ortho_poly method=type1;
class;
model yield=xp1 xp2 xp3 xp4/ solution;
title 'Using Orthog polynomials from IML';
ods select SolutionF;
run;
/*We can use proc glm to obtain the same results without using
IML codings. Proc glm will use the orthogonal contrast coefficients directly*/
proc glm data=fungal;
class humidity;
model yield = humidity;
contrast 'linear' humidity -2 -1 0 1 2;
contrast 'quadratic' humidity 2 -1 -2 -1 2;
contrast 'cubic' humidity -1 2 0 -2 1;
contrast 'quatic' humidity 1 -4 6 -4 1;
lsmeans humidity;
run;
Analysis of Variance | ||||||||
---|---|---|---|---|---|---|---|---|
Source | DF | Sum of Squares | Mean Square | ExpectedMean Square | Error Term | Error DF | F Value | Pr > F |
xp1 | 1 | 43.200000 | 43.200000 | Var(Residual) + Q(xp1) | MS(Residual) | 10 | 57.75 | < .0001 |
xp2 | 1 | 42.000000 | 42.000000 | Var(Residual) + Q(xp2) | MS(Residual) | 10 | 56.15 | < .0001 |
xp3 | 1 | 0.300000 | 0.300000 | Var(Residual) + Q(xp2) | MS(Residual) | 10 | 0.40 | 0.5407 |
xp4 | 1 | 2.100000 | 2.100000 | Var(Residual) + Q(xp4) | MS(Residual) | 10 | 2.81 | 0.1248 |
Residual | 10 | 7.480000 | 7.480000 | Var(Residual) |
Fitting a Quadratic Model with Proc Mixed
Often we can see that only a quadratic curvature is of interest in a set of data. In this case, we can plan to simply run an order 2 (quadratic) polynomial and can easily use proc mixed (the general linear model). This method requires centering the quantitative variable levels by subtracting the mean of the levels (50) and then creating the quadratic polynomial terms.
data fungal;
set fungal;
x=humidity-50;
x2=x**2;
run;
proc mixed data=fungal method=type1;
class;
model yield = x x2;
run;
The output is:
Type 1 Analysis of Variance | ||||||||
---|---|---|---|---|---|---|---|---|
Source | DF | Sum of Squares | Mean Square | Expected Mean Square | Error Term | Error DF | F Value | Pr > F |
x | 1 | 42.960333 | 42.960333 | Var(Residual) + Q(x) | MS(residual) | 12 | 61.18 | <.0001 |
x2 | 1 | 44.228810 | 44.228810 | Var(Residual) + Q(x2) | MS(residual) | 12 | 62.98 | <.0001 |
Residual | 12 | 8.426857 | 0.702238 | Var(Residual) |
We can also generate the solutions (coefficients) for the model with the following code
proc mixed data=fungal method=type1;
class;
model yield = x x2 / solution;
run;
which gives the following output for the regression coefficients.
Solution for Fixed Effects | |||||
---|---|---|---|---|---|
Effect | Estimate | Standard Error | DF | t Value | Pr > |t| |
Intercept | 19.4124 | 0.3372 | 12 | 57.57 | <.0001 |
x | 0.1197 | 0.01530 | 12 | 7.82 | <.0001 |
x2 | -0.01026 | 0.001293 | 12 | -7.94 | <.0001 |
Here we need to remember that the regression was based on centered values for the predictor, so we have to back-transform to get the coefficients in terms of the original variables. For example, the fitted second-order model for one predictor variable that is expressed in terms of centered values \(x = X-\bar{X}\):
\(\hat{Y}=b_0+b_1x+b_{2}x^2\)
becomes in terms of the original X variable:
\(\hat{Y}=b'_0+b'_1X+b'_{2}X^2\)
where:
\(\begin{align}
b'_0&=b_0-b_1\bar{X}+b_{2}\bar{X}^2\\
b'_1&=b_1-2b_{2}\bar{X}\\
b'_{2}&=b_{2}\\
\end{align}\)
In the example above, this back-transformation uses the estimates from the Solutions for Fixed Effects table as follows.
data backtransform;
bprime0=19.4124-(0.1197*50)+(-0.01026*(50**2));
bprime1=0.1197-(2*-0.01026*50);
bprime2=-0.01026;
title 'bprime0=b0-(b1*meanX)+(b2*(meanX)2)';
title2 'bprime1=b1-2*b2*meanX';
title3 'bprime2=b2';
run;
proc print data=backtransform;
var bprime0 bprime1 bprime2;
run;
The output is then:
Obs | bprime0 | bprime1 | bprime2 |
---|---|---|---|
1 | -12.2226 | 1.1457 | -0.01026 |
First, load and attach the Fungus data.
setwd("~/path-to-folder/")
fungal_data <- read.table("fungus_data.txt",header=T)
attach(fungal_data)
R has a built-in function for generating orthogonal polynomials which we utilize to obtain the Type 1 ANOVA table.
poly_lm = lm(yield ~ poly(humidity,4))
poly_aov1 = aov(poly_lm)
summary(poly_aov1)
Df Sum Sq Mean Sq F value Pr(>F) poly(humidity, 4) 4 89.32 22.331 35.48 7e-06 *** Residuals 10 6.29 0.629 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
It is clear from the F-test that at least one of the polynomial terms is significant. We can examine the t-tests of the individual terms using the summary function.
summary(poly_lm)
Call: lm(formula = yield ~ poly(humidity, 4))
Residuals: Min 1Q Median 3Q Max -0.8000 -0.5333 -0.3000 0.3833 1.1333
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 17.3600 0.2048 84.753 1.28e-15 *** poly(humidity, 4)1 6.5544 0.7933 8.262 8.87e-06 *** poly(humidity, 4)2 -6.6505 0.7933 -8.383 7.80e-06 *** poly(humidity, 4)3 0.5842 0.7933 0.736 0.478 poly(humidity, 4)4 1.3387 0.7933 1.688 0.122 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 0.7933 on 10 degrees of freedom Multiple R-squared: 0.9342, Adjusted R-squared: 0.9079 F-statistic: 35.48 on 4 and 10 DF, p-value: 7.005e-06
As is commonly found, only the quadratic curvature is of interest. We can then fit a standard quadratic model, first recentering the response as seen in previous sections.
humidity_center = humidity-50
humidity_center2 = humidity_center^2
new_data = data.frame(fungal_data,humidity_center,humidity_center2)
detach(fungal_data); attach(new_data)
quad_lm = lm(yield ~ humidity_center + humidity_center2)
aov1_quad = aov(quad_lm)
summary(aov1_quad)
Df Sum Sq Mean Sq F value Pr(>F) humidity_center 1 42.96 42.96 61.18 4.73e-06 *** humidity_center2 1 44.23 44.23 62.98 4.08e-06 *** Residuals 12 8.43 0.70 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
The following coefficients are produced.
quad_lm$coefficients
(Intercept) humidity_center humidity_center2 19.4123810 0.1196667 -0.0102619
To obtain the coefficients in terms of the original variables, we can back-transform these coefficients (output not shown).
b_0_prime = 19.4123810-0.1196667*50-0.0102619*50^2
b_0_prime
b_1_prime = 0.1196667-0.0102619*(-2*50)
b_1_prime
b_2_prime = -0.0102619
b_2_prime