Suppose we have ~n observations, ~y_1, ... ~y_~n, from ~n independent normally distributed variables:
~Y_~i ~ N(&mu._~i, &sigma.^2) , _ _ ~i = 1 ... ~n
all of which have the same variance. In vector notation we write this as:
_ #~y ~ N(#{&mu.}, &sigma.^2#I)
where #{&mu.} = (&mu._1 ... &mu._~n), and #I is the unit ~n # ~n matrix.
Some examples of the type of hypothesis we will be testing are:
All the models under consideration will be of the form: #{&mu.} &in. L, where L is a subspace of &reals.^~n. _ Such a model is called #~{Linear Normal Model}
_ #~y = (~y_1, ... ~y_~n) ~ N(#{&mu.}, &sigma.^2#I)
i.e.
~y_~i ~ N(&mu._~i, &sigma.^2), _ _ _ _ ~y_1, ... ~y_~n independent
and
#{&mu.} &in. L, where L is a subspace of &reals.^~n.
Suppose we have an observation #~y = (~y_1, ... ~y_~n). The likelihood function is the product of the probability density functions, i.e.
L(#~y) _ _ = _ _ _ _ prod{ fract{1,&sqrt.2&pi.&sigma.^2} exp rndb{fract{&minus.(~{y_i} - &mu._~i)^2,2&sigma.^2}} , ~i = 1, ~n}
_ _ _ _ = _ _ _ _ (2&pi.&sigma.^2)^{-~n/2} exp rndb{fract{&minus. sum{(~{y_i} - &mu._~i)^2, ~i = 1, ~n},2&sigma.^2}}
_ _ _ _ = _ _ _ _ (2&pi.&sigma.^2)^{-~n/2} exp rndb{fract{&minus. || #~y - #{&mu.} || ^2,2&sigma.^2}}
The likelihood function is maximised (for fixed &sigma.) when || #~y - #{&mu.} || ^2 is minimised.
The value of #{&mu.} &in. L for which || #~y - #{&mu.} || ^2 is minimised is est{#{&mu.}} = p(#~y), where p(#~y) is the orthogonal projection of #~y onto L.
To maximise with respect to &sigma take logs and differentiate:
l(#~y) _ _ = _ _ ln(L(#~y)) _ _ = _ _ &minus.fract{~n,2} ln(2&pi.) &minus.fract{~n,2} ln(&sigma.^2) - fract{ || #~y - p(#~y) || ^2,2&sigma.^2}
fract{&partial. l(#~y),&partial.&sigma.^2} _ _ = _ _ - fract{~n,2&sigma.^2} &plus. fract{ || #~y - p(#~y) || ^2,2&sigma.^4}
fract{&partial. l(#~y),&partial.&sigma.^2} _ array{ > / = / < } _ 0 _ _ _ _ <=> _ _ _ _ &sigma.^2 _ array{ < / = / > } _ fract{ || #~y - p(#~y) || ^2,~n}
The maximum likelihood estimators are
est{#{&mu.}} _ = _ p(#~y)
est{&sigma.}^2 _ = _ fract{ || #~y - p(#~y) || ^2,~n}
The quantity || #~y - p(#~y) || ^2 is called the "residual sum of squares", denoted RSS, so we often write:
est{&sigma.}^2 _ = _ fract{RSS,~n}
In order to determine the distribution of the estimators we will require the following lemma, whose proof will not be given here:
Lemma
Suppose ~Z_~i ~ N(0, &sigma.^2), _ ~i = 1 ... ~n, are ~n indedpendent normally distributed random variables with mean zero. Then #~Z = (~Z_1, ... , ~{Z_n}) is an ~n-dimensional random vector.
Now let #~u_1, ... ~{#u_n} be any orthonormal basis for &reals.^n, _ and let ~W_1, ... ~W_~n be the co-ordinates of #~Z with respect to the above basis.
i.e. #~Z = ~W_1#~u_1+ ... + ~{W_n}#~u_~n, _ _ [putting _ ~{W_i} = #~Z . ~{#u_i} = &sum._{~j} ~Z_~i ~v_{~i , ~j }, _ ~i = 1 ... ~n ]
Then _ ~W_1, ... ~W_~n are independent random variables, and ~W_~i ~ N(0, &sigma.^2), _ _ ~i = 1 ... ~n.
Suppose L has dimension ~d, where 1 =< ~d < ~n. Now for any #~y &in. &reals.^n, p(#~y) is the projection of #~y onto L, and (1 - p)(#~y) = #~y - p(#~y) is the orthogonal projection of #~y onto the space D, the othogonal complement of L, i.e. D &oplus. L = &reals.^~n.
So any #~y &in. &reals.^n, can be split into two parts:
#~y _ = _ p(#~y) + (1 - p)(#~y) _ _ _ _ _ _ where _ _ p(#~y) &in. L _ _ and _ _ (1 - p)(#~y) &in. D.
From linear algebra we know that there exists a basis #~u_1, ... #~u_~n for &reals.^n, (which can be made orthonormal) where #~u_1, ... #~u_{~d}, (~d < ~n) forms a basis for L and #~u_{~d+1}, ... #~u_~n is a basis for D.
Let #~Z = #~y - #{&mu.}, _ i.e. ~Z_~i = ~{Y_i} - &mu._~i ~ N(0, &sigma.^2), _ and let ~W_1, ... ~W_~n be the co-ordinates of #~Z with respect to the above basis. i.e. #Z = ~W_1#~u_1+ ... + ~W_~n#~u_~n. _ Then ~W_1, ... ~W_~n are independent and ~W_~i ~ N(0, &sigma.^2) _ by the change of basis lemma
Now #~Z - p(#~Z) = (#~y - &mu.) - p(#~y - &mu.) = #~y - p(#~y), [since &mu. &in. L, and so p(&mu.) = &mu.]
So #~y - p(#~y) is also the projection of #Z onto D. In terms of the above basis and co-ordinates:
_ #~y - p(#~y) _ _ _ _ _ = _ _ _ _ ~W_{~d+1}#~u_{~d+1} + ... + ~W_~n#~u_~n
|| #~y - p(#~y) || ^2 _ _ _ _ = _ _ _ _ ~W_{~d+1}^2 + ... + ~W_~n^2 _ _ _ _ ~ _ _ _ _ &sigma.^2&chi.^2(~n - ~d)
E( || #~y - p(#~y) || ^2) _ _ = _ _ _ _ (~n - ~d)&sigma.^2
i.e.
E( est{&sigma.}^2 ) _ = _ fract{(~n - ~d),~n}&sigma.^2
This is a biased estimator, i.e. E( est{&sigma.}^2 ) _ != _ &sigma.^2. ") ; An alternative, estimator for &sigma.^2 is:
~s^2 _ = _ fract{ || #~y - p(#~y) || ^2,~n - ~d} _ = _ fract{RSS,~n - ~d}
which has expectation &sigma.^2 and is therefore an unbiased estimator. For this, and other reasons that will become apparent, it is often used in preference to the Maximum Likelihood Estimator.
The quantity ~n - ~d is called the ~{degrees of freedom} of the sum of squares.
We will proceed to investigate a number of scenarios which fit into the model developed so far, i.e. where we have ~n observations ~y_1, ... ~{y_n} where ~{y_i} is an observation of the random variable ~{Y_i} ~ N(&mu., &sigma.^2), ~i = 1, ... ~n, ~y_1, ... ~{Y_n} independent, and #{&mu.} = (&mu._1, ... &mu._~n) &in. L, where L is a ~d dimensional linear subspace of &reals.^~n. Further, suppose that L is spanned by #~v_1, ... ,#~v_~d, _ ~{not necessarily orthogonal}.
We want to work out the maximum likelihood estimate (MLE) of #{&mu.}, we saw that est{#{&mu.}} = p(#~y).
Now #~y - p(#~y) &in. &reals.^~n - L, the orthogonal complement of L, so (#~y - p(#~y)) &dot. #~v_~i = 0 _ (where " &dot. " represents the scalar or dot product ), _ that is:
_ #~y &dot. #~v_~i _ = _ p(#~y) &dot. #~v_~i _ _ _ _ _ _ &forall. ~i.
or:
_ #~y &dot. #~v_~i _ = _ est{#{&mu.}} &dot. #~v_~i _ _ _ _ _ _ &forall. ~i.
So we can solve the ~n equations to find est{#{&mu.}}.
We can then calculate the ~{residual sum of squares}:
RSS _ = _ || #~y - p(#~y) || ^2
and then the estimate of the variance &sigma.^2 is:
~s^2 _ = _ RSS / ~n - ~d.