We reformulate the theory of elementary One-Way Analysis of Variance in terms of factors:
We have one factor, F say, with ~k levels, and which has the corresonding linear space L_F as we have seen. In the theory of Linear Normal Models it is assumed that the observations ~y_1, ... , ~y_~n are from independent normally distributed random variables ~Y_1, ... , ~Y_~n which have the same variance but different means: _ _ ~Y_~i _ ~ _ N ( &mu._~i , &sigma.^2) _ _ , and the model for the one-way analysis of variance model is that the means vector , _ #{&mu.} = ( &mu._1, ... , &mu._~n ) _ is in the space L_F, _ or equivalently that _ #{&mu.} = X_F #{&alpha.} , _ for some (unspecified) #{&alpha.} &in. &reals.^~k.
Once we have established this model we can test hypotheses on it. For example the hypothesis of a uniform mean, i.e. that the factor does not have any effect on the observations. This is equivalent to saying #{&mu.} &in. L_O , _ the space corresponding to the null factor .
So the design for this example is _ &Delta. = \{ I, F, O \}. _ To test for a uniform mean we calculated the quantity fract{ESS ./ ~k - 1,RSS ./ ~n - ~k} , where
ESS = || ~p_F #~y - ~p_O #~y ||^2 , _ RSS = || #~y - ~p_F #~y ||^2
Until now we've looked at Factors simply in terms of their abstract qualities, in terms of mappings between finite sets and the associated linear spaces and projections, concluding in the orthogonal partition of &reals.^~n determined by factors in an orthogonal design. So far we haven't considered the observations which are categorized by the factors. This we will now do, by considering some important statistics on a set of observations. For the moment we make no assumptions as to the nature of the variables (distribution, mean etc.) or make any hypotheses about them.
Let #{~y} = (~y_1, ... ,~{y_n}) be a set of observations. Let &Delta. be an orthogonal design on \{1, ... ,~n\}. For any factor F &in. &Delta. there is a linear space V_F with an orthogonal projection Q_F (as discussed in Factorial Design )
We define the following quantities:
SSD_F _ = _ || Q_F #{~y} || ^2 SS_F _ = _ || P_F #{~y} || ^2 |
SSD_F is known as the ~{sum of squares of the deviations} and SS_F is the ~{sum of squares} of the factor F.
The quantity _ d_F = dim V_F _ is the ~{degrees of freedom} corresponding to the SSD.
d_F _ = _ dim V_F |F| _ = _ dim L_F |
If &Delta. is an orthogonal design on the set of observations (i.e. the set of all factors under consideration) then we can draw up a table containing the values of the SSD and their corresponding degrees of freedom for all the factors in the design. We will call this the Table of Variances.
For the time being we concentrate on deriving the table, without reference to its use in estimation and hypothesis testing.
From the theory of 2-sided ANOVA we know that:
SS_F _ = _ || P_F #{~y} || ^2 _ = _ sum{ fract{S_{~f}^2,~{n_f}},~f &in. F, _ }
where
S_{~f} _ _ = _ sum{~{y_i},F(~i) = ~f, _ }, _ _ _ _ ~{n_f} _ _ = _ &hash.\{ ~i | F(~i) = ~f \}
We know that
Q_F _ = _ sum{&alpha._F^G P_G,G &le. F, _ }
so
SSD_F _ = _ || Q_F #{~y} || ^2 _ = _ _ #{~y}^TQ_F #{~y}
_ _ _ _ = _ sum{&alpha._F^G _ #{~y}^TP_G #{~y},G &le. F, _ } _ = _ sum{&alpha._F^G || P_G #{~y} || ^2 ,G &le. F, _ }
_ _ _ _ = _ sum{&alpha._F^G SS_G,G &le. F, _ }
similarly
d_F _ = _ tr(Q_F) _ = _ tr rndb{sum{&alpha._F^G P_G,G &le. F, _ }}
_ _ _ _ _ = _ sum{&alpha._F^G tr(P_G),G &le. F, _ } _ = _ sum{&alpha._F^G |G|,G &le. F, _ }
SSD_F _ = _ sum{&alpha._F^G SS_G,G &le. F, _ } d_F _ = _ sum{&alpha._F^G |G|,G &le. F, _ } |
The above formulae are explicit but not very useful for computation, as a priori we do not know the values of the &alpha._F^G. To find a more useful algorithm we use the formula for P_F in terms of Q_G previously derived, we have:
SS_F _ = _ || P_F #{~y} || ^2 _ = _ || &sum._{G &le. F} Q_G #{~y} || ^2
_ _ _ _ _ = _ &sum._{G &le. F} || Q_G #{~y} || ^2 _ = _ &sum._{G &le. F} SSD_G
SS_F _ = _ sum{ SSD_G,G &le. F, _ } SSD_F _ = _ SS_F &minus. sum{ SSD_G,G < F, _ } |
Similarly
|F| _ = _ dim L_F _ = _ &sum._{G &le. F} dim V_G _ = _ &sum._{G &le. F} d_G
|F| _ = _ sum{ d_G,G &le. F, _ } d_F _ = _ |F| &minus. sum{ d_G,G < F, _ } |
With the above formulae it is possible to work recursively from the coarsest factor, C say, (which will be the null factor in a balanced design), since SSDC = SSC, _ and _ dC = |C|.