Testing Hypotheses About Linear Normal Models

Page Contents

Nested Models

Consider ( as before ) the model #{~Y} = ( ~Y_1, ... ~Y_{~n} ) &tilde. N ( #{&mu.}, &sigma.^2#I ) , _ ~Y_1, ... ~Y_{~n} independent, _ #{&mu.} &in. L_1, where L_1 is a ~d_1 dimensional subspace of &reals.^{~n}.

For an observation #{~y}, the maximum likelihood estimators are:

est{#{&mu.}} _ = _ p_1 ( #{~y} ) , _ _ _ est{&sigma._1}^2 _ = _ fract{ || #{~y} - p_1 ( #{~y} ) || ^2,~n}

and the unbiased estimator of the variance is:

~s_1^2 _ = _ fract{ || #{~y} - p_1 ( #{~y} ) || ^2,~n - ~d_1}

where p_1 is the projection onto L_1.

The maximum likelihood function at its maximum is:

( 2&pi. est{&sigma._1}^2 ) ^{-~n/2} exp rndb{fract{ - || #{~y} - p_1 ( #{~y} ) || ^2,2 est{&sigma._1}^2}} _ _ _ = _ _ _ ( 2&pi. est{&sigma._1}^2 ) ^{-~n/2} e^{ - ~n/2}

Suppose we have another model, _ #{&mu.} &in. L_2, where L_2 &subset. L_1 is a ~d_2 dimensional subspace of &reals.^{~n} ( ~d_2 < ~d_1 ), then the maximum likelihood estimators are:

est{#{&mu.}} _ = _ p_2 ( #{~y} ) , _ _ _ _ est{&sigma._2}^2 _ = _ fract{ || #{~y} - p_2 ( #{~y} ) || ^2,~n}

and the unbiased estimator of the variance is:

~s_2^2 _ = _ fract{ || #{~y} - p_2 ( #{~y} ) || ^2,~n - ~d_2}

where p_2 is the projection onto L_2.

The value of the maximum likelihood function at its maximum is:

( 2&pi. est{&sigma._2}^2 ) ^{-~n/2} e^{ - ~n/2}

Model Reduction

Testing the hypothesis H: #{&mu.} &in. L_2 given that #{&mu.} &in. L_1, is known as testing for model reduction. To do this we use the likelihood ratio test:

Q ( #{~y} ) _ = _ _ fract{sup \{ L ( #{~y} ) | #{&mu.} &in. L_2 \}, sup \{ L ( #{~y} ) | #{&mu.} &in. L_1 \}}

_ _ _ _ _ = _ _ script{rndb{ fract{2&pi. est{&sigma._2}^2 ,2&pi. est{&sigma._1}^2} },,, - ~n/2,} _ = _ _ script{rndb{ fract{est{&sigma._2}^2 ,est{&sigma._1}^2} },,, - ~n/2,}

_ _ _ _ _ = _ _ script{rndb{ fract{ || #{~y} - p_2 ( #{~y} ) || ^2 , || #{~y} - p_1 ( #{~y} ) || ^2} },,, - ~n/2,} _ = _ _ script{rndb{ fract{RSS_2 ,RSS_1} },,, - ~n/2,}

But note that _ || #{~y} - p_2 ( #{~y} ) || ^2 _ = _ || #{~y} - p_1 ( #{~y} ) || ^2 + || p_1 ( #{~y} ) - p_2 ( #{~y} ) || ^2 ,_ by Pythagoras,_ as #{~y} - p_1 ( #{~y} ) is the orthogonal projection onto V_1 where V_1 &oplus. L_1 = &reals.^{~n}, and p_1 ( #{~y} ) - p_2 ( #{~y} ) is the projection onto V_2 where V_2 &oplus. L_2 = L_1. Obviously V_1 and V_2 are orthogonal. We can write _ &reals.^~n = V_1 o+ V_2 o+ L_2

In this context we say that:

|| p_1 ( #{~y} ) - p_2 ( #{~y} ) || ^2 _ is the sum of squares "#~{explained}" by the hypothesis, and denote it _ ESS.
|| #{~y} - p_2 ( #{~y} ) || ^2 _ or _ RSS_2 _ is the "#~{total}" sum of squares, denoted _ TSS.
|| #{~y} - p_1 ( #{~y} ) || ^2 _ or _ RSS_1 _ is the rest, or "#~{residual}", simply denoted _ RSS.

Note that we have:

TSS _ = _ ESS &plus. RSS

So we can write:

Q ( #{~y} ) _ = _ script{rndb{ fract{TSS ,RSS} },,, - ~n/2,} _ = _ script{rndb{ fract{RSS + ESS ,RSS} },,, - ~n/2,}

_ _ _ _ _ _ _ = _ script{rndb{ 1 + fract{ESS ,RSS} },,, - ~n/2,}

Now dividing the sums of squares by their respective degrees of freedom, we put:

#{~r} ( #{~y} ) _ = _ fract{ESS / ( ~d_1 - ~d_2 ) ,RSS / ( ~n - ~d_1 ) }

Q ( #{~y} ) _ = _ script{rndb{ 1 + fract{~d_1 - ~d_2, ~n - ~d_1 }#{~r} ( #{~y} ) },,, - ~n/2,}

Q increases as #{~r} decreases, and vice-versa, so

P ( Q ( #{~Y} ) < Q ( #{~y} ) ) _ = _ P ( #{~r} ( #{~Y} ) > #{~r} ( #{~y} ) ) _ = _ 1 - F_{#{~r}} ( #{~r} ( #{~y} ) )

where F_{#{~r}} is the distribution function of the random variable #{~r} ( #{~Y} ) .

Now note that

#{~r} ( #{~y} ) _ = _ fract{ESS / ( ~d_1 - ~d_2 ) ,RSS / ( ~n - ~d_1 ) }

now ESS / ( ~d_1 - ~d_2 ) and RSS / ( ~n - ~d_1 ) are independent [see above], and

ESS / ( ~d_1 - ~d_2 ) _ _ ~ _ _ &sigma.^2&chi.^2 ( ~d_1 - ~d_2 )

RSS / ( ~n - ~d_1 ) _ _ ~ _ _ &sigma.^2&chi.^2 ( ~n - ~d_1 )

then

#{~r} ( #{~y} ) _ _ ~ _ _ F ( ~d_1 - ~d_2, ~n - ~d_1 ) _ _ - the F-distribution

We now define the significance probability of the observed result under the hypothesis: _ #{&mu.} &in. L_2 given #{&mu.} &in. L_1 as

S.P. _ = _ F_Q ( Q ( #{~y} ) ) _ = _ 1 - F ( #{~r} ( #{~y} ) )

where F is the F ( ~d_1 - ~d_2, ~n - ~d_1 ) distribution function

We usually say that there is significant evidence for rejecting the hypothesis if S.P is less than an arbitrary previously defined level, _ &alpha. say ( e.g. 5% ) , which is equivalent to _ F ( #{~r} ( #{~y} ) ) _ being greater than 1 - &alpha. ( e.g. 95% ) .

ANOVA table

The results are tabulated as follows

	Sum of Squares	degrees of freedom	Mean Square	~r
Explained	ESS = \|\| p_1 ( #y ) - p_2 ( #y ) \|\| ^2	df_E = ~d_1 - ~d_2	MS_E = ESS / df_E	MS_E / MS_R
Residual	RSS = \|\| #y - p_1 ( #y ) \|\| ^2	df_R = ~n - ~d_1	MS_R = RSS / df_R
Total	TSS = \|\| #y - p_2 ( #y ) \|\| ^2	df_T = ~n - ~d_2	~s_2^2 = TSS / df_T

The significance probability ( S.P. ) for the hypothesis H_2: #{&mu.} &in. L_2 is 1 - F ( #{~r} ) , where F is the distribution function of the F ( ~d_1 - ~d_2, ~n - ~d_1 ) distribution.

So, for example, if &alpha. = 5%, then reject the hypothesis if F ( #{~r} ) > 95%, [or accept if F ( #{~r} ) &le. 95%]

Note: the higher the significance probability, the more likely we are to reject model reduction, i.e. stick with the original model.

This is not evidence for accepting the original model, but for NOT accepting the reduced model.

Calculating Sums of Squares

Quite often it is fairly straight forward ( but often tedious ) to work out the residual sum of squares for a given model. In this case, calculate RSS for the original model ( RSS₁ ) and for the reduced model ( RSS₂ ) .

Once these quantities have been calculated, all the quantities for the ANOVA table can be calculated, as summarized in the following diagram ( with degrees of freedom shown in brackets ) :

&reals.^{~n}							~n
	\| \| \| \|	RSS = RSS_1 [dfR = ~n - ~d_1]	\| \| \| \| \| \| \| \| \| \|	TSS = RSS_2 [dfT = ~n - ~d_2]
L_1							~d_1
					\| \| \| \|	ESS = RSS_2 - RSS_1 [dfE = ~d_1 - ~d_2]
L_2							~d_2