Estimating Parameters

Page Contents

Linear Normal Models

Suppose we have ~n observations, ~y_1, ... ~y_~n, from ~n independent normally distributed variables:

~Y_~i ~ N(&mu._~i, &sigma.^2) , _ _ ~i = 1 ... ~n

all of which have the same variance. In vector notation we write this as:

_ #~y ~ N(#{&mu.}, &sigma.^2#I)

where #{&mu.} = (&mu._1 ... &mu._~n), and #I is the unit ~n # ~n matrix.

Some examples of the type of hypothesis we will be testing are:

All the &mu._~i have the same (unknown) value.
The variables are split up into k groups, and the value of &mu._~i is the same for all the variables in a group.
The value of &mu._~iis linearly dependent on some value associates with the observation (linear regression).

All the models under consideration will be of the form: #{&mu.} &in. L, where L is a subspace of &reals.^~n. _ Such a model is called #~{Linear Normal Model}

Maximum Likelihood Estimators

_ #~y = (~y_1, ... ~y_~n) ~ N(#{&mu.}, &sigma.^2#I)

i.e.

~y_~i ~ N(&mu._~i, &sigma.^2), _ _ _ _ ~y_1, ... ~y_~n independent

and

#{&mu.} &in. L, where L is a subspace of &reals.^~n.

Suppose we have an observation #~y = (~y_1, ... ~y_~n). The likelihood function is the product of the probability density functions, i.e.

L(#~y) _ _ = _ _ _ _ prod{ fract{1,&sqrt.2&pi.&sigma.^2} exp rndb{fract{&minus.(~{y_i} - &mu._~i)^2,2&sigma.^2}} , ~i = 1, ~n}

_ _ _ _ = _ _ _ _ (2&pi.&sigma.^2)^{-~n/2} exp rndb{fract{&minus. sum{(~{y_i} - &mu._~i)^2, ~i = 1, ~n},2&sigma.^2}}

_ _ _ _ = _ _ _ _ (2&pi.&sigma.^2)^{-~n/2} exp rndb{fract{&minus. || #~y - #{&mu.} || ^2,2&sigma.^2}}

The likelihood function is maximised (for fixed &sigma.) when || #~y - #{&mu.} || ^2 is minimised.
The value of #{&mu.} &in. L for which || #~y - #{&mu.} || ^2 is minimised is est{#{&mu.}} = p(#~y), where p(#~y) is the orthogonal projection of #~y onto L.

To maximise with respect to &sigma take logs and differentiate:

l(#~y) _ _ = _ _ ln(L(#~y)) _ _ = _ _ &minus.fract{~n,2} ln(2&pi.) &minus.fract{~n,2} ln(&sigma.^2) - fract{ || #~y - p(#~y) || ^2,2&sigma.^2}

fract{&partial. l(#~y),&partial.&sigma.^2} _ _ = _ _ - fract{~n,2&sigma.^2} &plus. fract{ || #~y - p(#~y) || ^2,2&sigma.^4}

fract{&partial. l(#~y),&partial.&sigma.^2} _ array{ > / = / < } _ 0 _ _ _ _ <=> _ _ _ _ &sigma.^2 _ array{ < / = / > } _ fract{ || #~y - p(#~y) || ^2,~n}

The maximum likelihood estimators are

est{#{&mu.}} _ = _ p(#~y)

est{&sigma.}^2 _ = _ fract{ || #~y - p(#~y) || ^2,~n}

Residual Sum of Squares

The quantity || #~y - p(#~y) || ^2 is called the "residual sum of squares", denoted RSS, so we often write:

est{&sigma.}^2 _ = _ fract{RSS,~n}

Change of Basis Lemma

In order to determine the distribution of the estimators we will require the following lemma, whose proof will not be given here:

Lemma

Suppose ~Z_~i ~ N(0, &sigma.^2), _ ~i = 1 ... ~n, are ~n indedpendent normally distributed random variables with mean zero. Then #~Z = (~Z_1, ... , ~{Z_n}) is an ~n-dimensional random vector.
Now let #~u_1, ... ~{#u_n} be any orthonormal basis for &reals.^n, _ and let ~W_1, ... ~W_~n be the co-ordinates of #~Z with respect to the above basis.
i.e. #~Z = ~W_1#~u_1+ ... + ~{W_n}#~u_~n, _ _ [putting _ ~{W_i} = #~Z . ~{#u_i} = &sum._{~j} ~Z_~i ~v_{~i , ~j }, _ ~i = 1 ... ~n ]

Then _ ~W_1, ... ~W_~n are independent random variables, and ~W_~i ~ N(0, &sigma.^2), _ _ ~i = 1 ... ~n.

Distribution of Estimators

Suppose L has dimension ~d, where 1 =< ~d < ~n. Now for any #~y &in. &reals.^n, p(#~y) is the projection of #~y onto L, and (1 - p)(#~y) = #~y - p(#~y) is the orthogonal projection of #~y onto the space D, the othogonal complement of L, i.e. D &oplus. L = &reals.^~n.

So any #~y &in. &reals.^n, can be split into two parts:

#~y _ = _ p(#~y) + (1 - p)(#~y) _ _ _ _ _ _ where _ _ p(#~y) &in. L _ _ and _ _ (1 - p)(#~y) &in. D.

From linear algebra we know that there exists a basis #~u_1, ... #~u_~n for &reals.^n, (which can be made orthonormal) where #~u_1, ... #~u_{~d}, (~d < ~n) forms a basis for L and #~u_{~d+1}, ... #~u_~n is a basis for D.

Let #~Z = #~y - #{&mu.}, _ i.e. ~Z_~i = ~{Y_i} - &mu._~i ~ N(0, &sigma.^2), _ and let ~W_1, ... ~W_~n be the co-ordinates of #~Z with respect to the above basis. i.e. #Z = ~W_1#~u_1+ ... + ~W_~n#~u_~n. _ Then ~W_1, ... ~W_~n are independent and ~W_~i ~ N(0, &sigma.^2) _ by the change of basis lemma

Now #~Z - p(#~Z) = (#~y - &mu.) - p(#~y - &mu.) = #~y - p(#~y), [since &mu. &in. L, and so p(&mu.) = &mu.]

So #~y - p(#~y) is also the projection of #Z onto D. In terms of the above basis and co-ordinates:

_ #~y - p(#~y) _ _ _ _ _ = _ _ _ _ ~W_{~d+1}#~u_{~d+1} + ... + ~W_~n#~u_~n

|| #~y - p(#~y) || ^2 _ _ _ _ = _ _ _ _ ~W_{~d+1}^2 + ... + ~W_~n^2 _ _ _ _ ~ _ _ _ _ &sigma.^2&chi.^2(~n - ~d)

E( || #~y - p(#~y) || ^2) _ _ = _ _ _ _ (~n - ~d)&sigma.^2

i.e.

E( est{&sigma.}^2 ) _ = _ fract{(~n - ~d),~n}&sigma.^2

This is a biased estimator, i.e. E( est{&sigma.}^2 ) _ != _ &sigma.^2. ") ; An alternative, estimator for &sigma.^2 is:

~s^2 _ = _ fract{ || #~y - p(#~y) || ^2,~n - ~d} _ = _ fract{RSS,~n - ~d}

which has expectation &sigma.^2 and is therefore an unbiased estimator. For this, and other reasons that will become apparent, it is often used in preference to the Maximum Likelihood Estimator.

Degrees of Freedom

The quantity ~n - ~d is called the ~{degrees of freedom} of the sum of squares.

Estimating Mean and Standard Deviation

We will proceed to investigate a number of scenarios which fit into the model developed so far, i.e. where we have ~n observations ~y_1, ... ~{y_n} where ~{y_i} is an observation of the random variable ~{Y_i} ~ N(&mu., &sigma.^2), ~i = 1, ... ~n, ~y_1, ... ~{Y_n} independent, and #{&mu.} = (&mu._1, ... &mu._~n) &in. L, where L is a ~d dimensional linear subspace of &reals.^~n. Further, suppose that L is spanned by #~v_1, ... ,#~v_~d, _ ~{not necessarily orthogonal}.

We want to work out the maximum likelihood estimate (MLE) of #{&mu.}, we saw that est{#{&mu.}} = p(#~y).
Now #~y - p(#~y) &in. &reals.^~n - L, the orthogonal complement of L, so (#~y - p(#~y)) &dot. #~v_~i = 0 _ (where " &dot. " represents the scalar or dot product ), _ that is:

_ #~y &dot. #~v_~i _ = _ p(#~y) &dot. #~v_~i _ _ _ _ _ _ &forall. ~i.

or:

_ #~y &dot. #~v_~i _ = _ est{#{&mu.}} &dot. #~v_~i _ _ _ _ _ _ &forall. ~i.

So we can solve the ~n equations to find est{#{&mu.}}.

We can then calculate the ~{residual sum of squares}:

RSS _ = _ || #~y - p(#~y) || ^2

and then the estimate of the variance &sigma.^2 is:

~s^2 _ = _ RSS / ~n - ~d.