Consider ( as before ) the model #{~Y} = ( ~Y_1, ... ~Y_{~n} ) &tilde. N ( #{&mu.}, &sigma.^2#I ) , _ ~Y_1, ... ~Y_{~n} independent, _ #{&mu.} &in. L_1, where L_1 is a ~d_1 dimensional subspace of &reals.^{~n}.
For an observation #{~y}, the maximum likelihood estimators are:
est{#{&mu.}} _ = _ p_1 ( #{~y} ) , _ _ _ est{&sigma._1}^2 _ = _ fract{ || #{~y} - p_1 ( #{~y} ) || ^2,~n}
and the unbiased estimator of the variance is:
~s_1^2 _ = _ fract{ || #{~y} - p_1 ( #{~y} ) || ^2,~n - ~d_1}
where p_1 is the projection onto L_1.
The maximum likelihood function at its maximum is:
( 2&pi. est{&sigma._1}^2 ) ^{-~n/2} exp rndb{fract{ - || #{~y} - p_1 ( #{~y} ) || ^2,2 est{&sigma._1}^2}} _ _ _ = _ _ _ ( 2&pi. est{&sigma._1}^2 ) ^{-~n/2} e^{ - ~n/2}
Suppose we have another model, _ #{&mu.} &in. L_2, where L_2 &subset. L_1 is a ~d_2 dimensional subspace of &reals.^{~n} ( ~d_2 < ~d_1 ), then the maximum likelihood estimators are:
est{#{&mu.}} _ = _ p_2 ( #{~y} ) , _ _ _ _ est{&sigma._2}^2 _ = _ fract{ || #{~y} - p_2 ( #{~y} ) || ^2,~n}
and the unbiased estimator of the variance is:
~s_2^2 _ = _ fract{ || #{~y} - p_2 ( #{~y} ) || ^2,~n - ~d_2}
where p_2 is the projection onto L_2.
The value of the maximum likelihood function at its maximum is:
( 2&pi. est{&sigma._2}^2 ) ^{-~n/2} e^{ - ~n/2}
Testing the hypothesis H: #{&mu.} &in. L_2 given that #{&mu.} &in. L_1, is known as testing for model reduction. To do this we use the likelihood ratio test:
Q ( #{~y} ) _ = _ _ fract{sup \{ L ( #{~y} ) | #{&mu.} &in. L_2 \}, sup \{ L ( #{~y} ) | #{&mu.} &in. L_1 \}}
_ _ _ _ _ = _ _ script{rndb{ fract{2&pi. est{&sigma._2}^2 ,2&pi. est{&sigma._1}^2} },,, - ~n/2,} _ = _ _ script{rndb{ fract{est{&sigma._2}^2 ,est{&sigma._1}^2} },,, - ~n/2,}
_ _ _ _ _ = _ _ script{rndb{ fract{ || #{~y} - p_2 ( #{~y} ) || ^2 , || #{~y} - p_1 ( #{~y} ) || ^2} },,, - ~n/2,} _ = _ _ script{rndb{ fract{RSS_2 ,RSS_1} },,, - ~n/2,}
But note that _ || #{~y} - p_2 ( #{~y} ) || ^2 _ = _ || #{~y} - p_1 ( #{~y} ) || ^2 + || p_1 ( #{~y} ) - p_2 ( #{~y} ) || ^2 ,_ by Pythagoras,_ as #{~y} - p_1 ( #{~y} ) is the orthogonal projection onto V_1 where V_1 &oplus. L_1 = &reals.^{~n}, and p_1 ( #{~y} ) - p_2 ( #{~y} ) is the projection onto V_2 where V_2 &oplus. L_2 = L_1. Obviously V_1 and V_2 are orthogonal. We can write _ &reals.^~n = V_1 o+ V_2 o+ L_2
In this context we say that:
Note that we have:
TSS _ = _ ESS &plus. RSS
So we can write:
Q ( #{~y} ) _ = _ script{rndb{ fract{TSS ,RSS} },,, - ~n/2,} _ = _ script{rndb{ fract{RSS + ESS ,RSS} },,, - ~n/2,}
_ _ _ _ _ _ _ = _ script{rndb{ 1 + fract{ESS ,RSS} },,, - ~n/2,}
Now dividing the sums of squares by their respective degrees of freedom, we put:
#{~r} ( #{~y} ) _ = _ fract{ESS / ( ~d_1 - ~d_2 ) ,RSS / ( ~n - ~d_1 ) }
So
Q ( #{~y} ) _ = _ script{rndb{ 1 + fract{~d_1 - ~d_2, ~n - ~d_1 }#{~r} ( #{~y} ) },,, - ~n/2,}
Q increases as #{~r} decreases, and vice-versa, so
P ( Q ( #{~Y} ) < Q ( #{~y} ) ) _ = _ P ( #{~r} ( #{~Y} ) > #{~r} ( #{~y} ) ) _ = _ 1 - F_{#{~r}} ( #{~r} ( #{~y} ) )
where F_{#{~r}} is the distribution function of the random variable #{~r} ( #{~Y} ) .
Now note that
#{~r} ( #{~y} ) _ = _ fract{ESS / ( ~d_1 - ~d_2 ) ,RSS / ( ~n - ~d_1 ) }
now ESS / ( ~d_1 - ~d_2 ) and RSS / ( ~n - ~d_1 ) are independent [see above], and
ESS / ( ~d_1 - ~d_2 ) _ _ ~ _ _ &sigma.^2&chi.^2 ( ~d_1 - ~d_2 )
RSS / ( ~n - ~d_1 ) _ _ ~ _ _ &sigma.^2&chi.^2 ( ~n - ~d_1 )
then
#{~r} ( #{~y} ) _ _ ~ _ _ F ( ~d_1 - ~d_2, ~n - ~d_1 ) _ _ - the F-distribution
We now define the significance probability of the observed result under the hypothesis: _ #{&mu.} &in. L_2 given #{&mu.} &in. L_1 as
S.P. _ = _ F_Q ( Q ( #{~y} ) ) _ = _ 1 - F ( #{~r} ( #{~y} ) )
where F is the F ( ~d_1 - ~d_2, ~n - ~d_1 ) distribution function
We usually say that there is significant evidence for rejecting the hypothesis if S.P is less than an arbitrary previously defined level, _ &alpha. say ( e.g. 5% ) , which is equivalent to _ F ( #{~r} ( #{~y} ) ) _ being greater than 1 - &alpha. ( e.g. 95% ) .
The results are tabulated as follows
Sum of Squares | degrees of freedom | Mean Square | ~r | |
Explained | ESS = || p_1 ( #y ) - p_2 ( #y ) || ^2 | df_E = ~d_1 - ~d_2 | MS_E = ESS / df_E | MS_E / MS_R |
Residual | RSS = || #y - p_1 ( #y ) || ^2 | df_R = ~n - ~d_1 | MS_R = RSS / df_R | |
Total | TSS = || #y - p_2 ( #y ) || ^2 | df_T = ~n - ~d_2 | ~s_2^2 = TSS / df_T |
The significance probability ( S.P. ) for the hypothesis H_2: #{&mu.} &in. L_2 is 1 - F ( #{~r} ) , where F is the distribution function of the F ( ~d_1 - ~d_2, ~n - ~d_1 ) distribution.
So, for example, if &alpha. = 5%, then reject the hypothesis if F ( #{~r} ) > 95%, [or accept if F ( #{~r} ) &le. 95%]
Note: the higher the significance probability, the more likely we are to reject model reduction, i.e. stick with the original model.
This is not evidence for accepting the original model, but for NOT accepting the reduced model.
Quite often it is fairly straight forward ( but often tedious ) to work out the residual sum of squares for a given model. In this case, calculate RSS for the original model ( RSS1 ) and for the reduced model ( RSS2 ) .
Once these quantities have been calculated, all the quantities for the ANOVA table can be calculated, as summarized in the following diagram ( with degrees of freedom shown in brackets ) :
&reals.^{~n} | ~n | ||||||
| | | | |
RSS = RSS_1 [dfR = ~n - ~d_1] |
| | | | | | | | | | |
TSS = RSS_2 [dfT = ~n - ~d_2] |
||||
L_1 | ~d_1 | ||||||
| | | | |
ESS = RSS_2 - RSS_1 [dfE = ~d_1 - ~d_2] |
||||||
L_2 | ~d_2 |