# Estimating Parameters

Page Contents

## Linear Normal Models

Suppose we have ~n observations, ~y_1, ... ~y_~n, from ~n independent normally distributed variables:

~Y_~i ~ N(&mu._~i, &sigma.^2) , _ _ ~i = 1 ... ~n

all of which have the same variance. In vector notation we write this as:

_ #~y ~ N(#{&mu.}, &sigma.^2#I)

where #{&mu.} = (&mu._1 ... &mu._~n), and #I is the unit ~n # ~n matrix.

Some examples of the type of hypothesis we will be testing are:

• All the &mu._~i have the same (unknown) value.
• The variables are split up into k groups, and the value of &mu._~i is the same for all the variables in a group.
• The value of &mu._~iis linearly dependent on some value associates with the observation (linear regression).

All the models under consideration will be of the form: #{&mu.} &in. L, where L is a subspace of &reals.^~n. _ Such a model is called #~{Linear Normal Model}

## Maximum Likelihood Estimators

_ #~y = (~y_1, ... ~y_~n) ~ N(#{&mu.}, &sigma.^2#I)

i.e.

~y_~i ~ N(&mu._~i, &sigma.^2), _ _ _ _ ~y_1, ... ~y_~n independent

and

#{&mu.} &in. L, where L is a subspace of &reals.^~n.

Suppose we have an observation #~y = (~y_1, ... ~y_~n). The likelihood function is the product of the probability density functions, i.e.

L(#~y) _ _ = _ _ _ _ prod{ fract{1,&sqrt.2&pi.&sigma.^2} exp rndb{fract{&minus.(~{y_i} - &mu._~i)^2,2&sigma.^2}} , ~i = 1, ~n}

_ _ _ _ = _ _ _ _ (2&pi.&sigma.^2)^{-~n/2} exp rndb{fract{&minus. sum{(~{y_i} - &mu._~i)^2, ~i = 1, ~n},2&sigma.^2}}

_ _ _ _ = _ _ _ _ (2&pi.&sigma.^2)^{-~n/2} exp rndb{fract{&minus. || #~y - #{&mu.} || ^2,2&sigma.^2}}

The likelihood function is maximised (for fixed &sigma.) when || #~y - #{&mu.} || ^2 is minimised.
The value of #{&mu.} &in. L for which || #~y - #{&mu.} || ^2 is minimised is est{#{&mu.}} = p(#~y), where p(#~y) is the orthogonal projection of #~y onto L.

To maximise with respect to &sigma take logs and differentiate:

l(#~y) _ _ = _ _ ln(L(#~y)) _ _ = _ _ &minus.fract{~n,2} ln(2&pi.) &minus.fract{~n,2} ln(&sigma.^2) - fract{ || #~y - p(#~y) || ^2,2&sigma.^2}

fract{&partial. l(#~y),&partial.&sigma.^2} _ _ = _ _ - fract{~n,2&sigma.^2} &plus. fract{ || #~y - p(#~y) || ^2,2&sigma.^4}

fract{&partial. l(#~y),&partial.&sigma.^2} _ array{ > / = / < } _ 0 _ _ _ _ <=> _ _ _ _ &sigma.^2 _ array{ < / = / > } _ fract{ || #~y - p(#~y) || ^2,~n}

The maximum likelihood estimators are

est{#{&mu.}} _ = _ p(#~y)

est{&sigma.}^2 _ = _ fract{ || #~y - p(#~y) || ^2,~n}

### Residual Sum of Squares

The quantity || #~y - p(#~y) || ^2 is called the "residual sum of squares", denoted RSS, so we often write:

## Change of Basis Lemma

In order to determine the distribution of the estimators we will require the following lemma, whose proof will not be given here:

Lemma

Suppose ~Z_~i ~ N(0, &sigma.^2), _ ~i = 1 ... ~n, are ~n indedpendent normally distributed random variables with mean zero. Then #~Z = (~Z_1, ... , ~{Z_n}) is an ~n-dimensional random vector.
Now let #~u_1, ... ~{#u_n} be any orthonormal basis for &reals.^n, _ and let ~W_1, ... ~W_~n be the co-ordinates of #~Z with respect to the above basis.
i.e. #~Z = ~W_1#~u_1+ ... + ~{W_n}#~u_~n, _ _ [putting _ ~{W_i} = #~Z . ~{#u_i} = &sum._{~j} ~Z_~i ~v_{~i , ~j }, _ ~i = 1 ... ~n ]

Then _ ~W_1, ... ~W_~n are independent random variables, and ~W_~i ~ N(0, &sigma.^2), _ _ ~i = 1 ... ~n.

## Distribution of Estimators

Suppose L has dimension ~d, where 1 =< ~d < ~n. Now for any #~y &in. &reals.^n, p(#~y) is the projection of #~y onto L, and (1 - p)(#~y) = #~y - p(#~y) is the orthogonal projection of #~y onto the space D, the othogonal complement of L, i.e. D &oplus. L = &reals.^~n.

So any #~y &in. &reals.^n, can be split into two parts:

#~y _ = _ p(#~y) + (1 - p)(#~y) _ _ _ _ _ _ where _ _ p(#~y) &in. L _ _ and _ _ (1 - p)(#~y) &in. D.

From linear algebra we know that there exists a basis #~u_1, ... #~u_~n for &reals.^n, (which can be made orthonormal) where #~u_1, ... #~u_{~d}, (~d < ~n) forms a basis for L and #~u_{~d+1}, ... #~u_~n is a basis for D.

Let #~Z = #~y - #{&mu.}, _ i.e. ~Z_~i = ~{Y_i} - &mu._~i ~ N(0, &sigma.^2), _ and let ~W_1, ... ~W_~n be the co-ordinates of #~Z with respect to the above basis. i.e. #Z = ~W_1#~u_1+ ... + ~W_~n#~u_~n. _ Then ~W_1, ... ~W_~n are independent and ~W_~i ~ N(0, &sigma.^2) _ by the change of basis lemma

Now #~Z - p(#~Z) = (#~y - &mu.) - p(#~y - &mu.) = #~y - p(#~y), [since &mu. &in. L, and so p(&mu.) = &mu.]

So #~y - p(#~y) is also the projection of #Z onto D. In terms of the above basis and co-ordinates:

_ #~y - p(#~y) _ _ _ _ _ = _ _ _ _ ~W_{~d+1}#~u_{~d+1} + ... + ~W_~n#~u_~n

|| #~y - p(#~y) || ^2 _ _ _ _ = _ _ _ _ ~W_{~d+1}^2 + ... + ~W_~n^2 _ _ _ _ ~ _ _ _ _ &sigma.^2&chi.^2(~n - ~d)

E( || #~y - p(#~y) || ^2) _ _ = _ _ _ _ (~n - ~d)&sigma.^2

i.e.

E( est{&sigma.}^2 ) _ = _ fract{(~n - ~d),~n}&sigma.^2

This is a biased estimator, i.e. E( est{&sigma.}^2 ) _ != _ &sigma.^2. ") ; An alternative, estimator for &sigma.^2 is:

~s^2 _ = _ fract{ || #~y - p(#~y) || ^2,~n - ~d} _ = _ fract{RSS,~n - ~d}

which has expectation &sigma.^2 and is therefore an unbiased estimator. For this, and other reasons that will become apparent, it is often used in preference to the Maximum Likelihood Estimator.

### Degrees of Freedom

The quantity ~n - ~d is called the ~{degrees of freedom} of the sum of squares.

## Estimating Mean and Standard Deviation

We will proceed to investigate a number of scenarios which fit into the model developed so far, i.e. where we have ~n observations ~y_1, ... ~{y_n} where ~{y_i} is an observation of the random variable ~{Y_i} ~ N(&mu., &sigma.^2), ~i = 1, ... ~n, ~y_1, ... ~{Y_n} independent, and #{&mu.} = (&mu._1, ... &mu._~n) &in. L, where L is a ~d dimensional linear subspace of &reals.^~n. Further, suppose that L is spanned by #~v_1, ... ,#~v_~d, _ ~{not necessarily orthogonal}.

We want to work out the maximum likelihood estimate (MLE) of #{&mu.}, we saw that est{#{&mu.}} = p(#~y).
Now #~y - p(#~y) &in. &reals.^~n - L, the orthogonal complement of L, so (#~y - p(#~y)) &dot. #~v_~i = 0 _ (where " &dot. " represents the scalar or dot product ), ­_ that is:

_ #~y &dot. #~v_~i _ = _ p(#~y) &dot. #~v_~i _ _ _ _ _ _ &forall. ~i.

or:

_ #~y &dot. #~v_~i _ = _ est{#{&mu.}} &dot. #~v_~i _ _ _ _ _ _ &forall. ~i.

So we can solve the ~n equations to find est{#{&mu.}}.

We can then calculate the ~{residual sum of squares}:

RSS _ = _ || #~y - p(#~y) || ^2

and then the estimate of the variance &sigma.^2 is:

~s^2 _ = _ RSS / ~n - ~d.