A set of observations can be split up according to certain #~{factors}, the factor will usually be a quality associated with each observation, such as eye colour or nationality if the measurements are taken on people, or some experimental constraint, such as field or plot for planting when measuring yield in agricultural trials, or drug/placebo in medical trials.
A factor is defined mathematically as a mapping from the set of observations to another set. The elements of the image set are usually determined before the observations are taken, and are called the #~{levels} of the factor. Alternatively and equivalently, we sometimes refer to the reverse image of these elements as levels, i.e. the sub-set of observations that map to that particular element.
Two factors are #~{equivalent} if the partitions induced by the factors, i.e. the reverse image of the levels, are the same. For example, suppose an experiment is set up to measure income in a countries cities. The cities are divided into northern and southern cities, and those with more that 100,000 inhabitant ('large') or those with less than 100,000 inhabitants ('small'). In the survey, it turns out that all the small cities are southern cities, and all the large cities are northern. So ~{in this case}, the two factors "size" and "location" are equivalent. This illustrates two important points about factors:
Firstly, properties of factors, such as equvalence and others introduced subsequently, are dependent on the actual observations we have. We are interested in the mapping of the observations to the factors more than the set of levels of a factor. There is nothing to say, in the example, that all southern cities are small, while no northern city is. It is just that in the data we have, the two qualities coincide. Though of course, in that particular country this may be the case.
The second point is that in analysis of variance, we are looking at factors as a way of grouping observations, and not into the distribution of the factor levels themselves. In the above example the survey was set up to look at income, not to see if northern cities are larger than southern ones. (Though that is a perfectly valid field of research of course.) The researchers in the above example would probably revisit their choice of cities, to ensure the better representation of large southern and small northern ones. This is of course a question of ~{experimental design}.
If there are several factors on a set of observations, we can compare two factors by considering their #~{granularity}: one factor is said to be #~{finer} than another if the levels of the first are completely nested in the second, for example city of birth will be finer than country of birth, sub-plots finer than plots, etc., conversely we say that the second factor is #~{coarser} then the first. Note that two factors need not be finer or coarser than each other, but the relationship is transitive, i.e. if A is finer than B, and B is finer than C, then A is finer than C. So granularity imposes a partial ordering on the factors on a set of observations.
The finest factor on any set of observations is the #~{identity factor}, I, which maps each observation to itself, and the coarsest factor is the #~{null factor}, O, which maps all observations to the same level.
If there are two factors, A and B on the same set of observations, their #~{cross-product} A # B is the factor whose levels are all the combinations of the levels of the two factors.
The cross-product is finer than both its constituent factors, and in fact, if there is any factor that is finer than two factors, then it is either finer than their cross-product or equivalent to it. So the cross product is the coarsest factor that is finer than both its constituent factors, and is also called the #~{maximum} of the two factors.
Conversely, a factor that is the finest factor that is coarser than two factors is called the #~{minimum} factor of the two factors. The minimum is unique.
We now define a correspondence between a factor and a linear subspace. This is pretty abstract at this stage, but will turn out to be useful when we need to work out components of the variance, to test hypotheses about factors.
First we index the set of observations, \{ ~y_1, ... , ~y_n \}. The particular indexing doesn't matter. For a factor F, which has ~k levels, \{ ~f_1, ... , ~f_k \} , we define the #~{design matrix} of F, which is the ~n # ~k matrix _ X_F _ = _ #( ~x_{~i , ~j} #) , _ where _ ~x_{~i , ~j} = 1 _ if _ F( ~y_~i ) = ~f_~j , _ ~x_{~i , ~j} = 0 _ otherwise.
There will be a linear map _ &phi._F #: &reals.^~k -> &reals.^~n corresponding to X_F, and we define the linear subspace L_F &subseteq. &reals.^~n, _ L_F _ = _ im &phi._F. _ ( If you're willing to accept the slight abuse of notation, we can just write _ L_F _ = _ im X_F. )
Note that a '1' only occurs once in each row of X_F, so its column vectors #~v_1 , ... , #~v_~k , ( #~v_~j &in. &reals.^~n ) are linearly independent, in fact orthogonal. So _ #~v_1 , ... , #~v_~k _ forms a basis for L_F. Alternatively we could just have defined L_F as the subspace generated by #~v_1 , ... , #~v_~k, this demonstrates that it doesn't really matter in what order we take the levels of F, it will be the same corresponding linear space, though it will give rise to slightly different design matrices.