Linear Regression

Page Contents

Regression Model

Consider the situation where there are two quantities or variables, ~x and ~y say, associated with each observation, and that the value of ~y is determined by the value of ~x, in the sense that ~y is the value of a random variable ~Y, where ~Y &tilde. N ( &mu., &sigma.^2 ) , _ &mu. = &alpha. + &beta. ~x, &alpha. &in. &reals., &beta. &in. &reals..

So if we knew the values of &alpha. and &beta., then once we knew the value of ~x, we would know the distribution of ~Y. The variable ~X is known as the ~{independent} or ~{explanatory} variable, while ~Y is called the ~{dependent} or ~{response} variable. When the dependency is of the form _ &mu. = &alpha. + &beta. ~x , _ we say that ~Y is ~{linearly dependent} on ~X.

So we will be looking at models of the form: ~n observations, each with two values ~x_~i and ~y_~i, where ~y_1, ... , ~y_{~n} are values of the independent random variables ~Y_1, ... , ~Y_{~n} , _ and _ ~Y_~i &tilde. N ( &alpha. + &beta. ~x_~i, &sigma.^2 )

This is just saying that the conditional distribution of ~Y given ~x is: _ _ ( ~Y | ~x ) &tilde. N ( &alpha. + &beta. ~x, &sigma.^2 ).

This is called a ~{linear regression} model.

Another, equivalent way of expressing the model is

_ _ _ _ _ _ _ ~Y_~i = &alpha. + &beta. ~x_~i + &epsilon._~i, _ _ where _ _ &epsilon._~i &tilde. N( 0, &sigma.^2 ) _ _ &forall. ~i

Estimating Regression Parameters

So #{&mu.} &in. L, where L is spanned by _ #~e = ( 1, ... , 1 ) , _ and _ #~x = ( ~x_1, ... , ~x_{~n} ) , _ i.e. #{&mu.} = &alpha. #~e + &beta. #~x, _ &alpha., &beta. &in. &reals. , _ dim L = 2.

We have:

p_1( #~y ) _ = _ est{&alpha.} #~e + est{&beta.} #~x

and

_ #~y #. #~e _ = _ p_1( #~y ) #. #~e _ = _ est{&alpha.} #~e #. #~e + est{&beta.} #~x #. #~e

i.e.

&sum._~i ~y_~i _ = _ est{&alpha.} ~n + est{&beta.} &sum._~i ~x_~i

giving

est{&alpha.} _ = _ _ $~y - est{&beta.} $~x

where _ _ $~y _ = _ ( &sum._~i ~y_~i ) ./ ~n , _ and _ $~x _ = _ ( &sum._~i ~x_~i ) ./ ~n

Also

_ #~y #. #~x _ = _ p_1 ( #~y ) #. #~x _ = _ est{&alpha.} #~e #. #~x + est{&beta.} #~x #. #~x

_ _ _ _ _ = _ ( $~y - est{&beta.} $~x ) #~e #. #~x + est{&beta.} #~x #. #~x

_ _ _ _ _ = _ _ $~y #~e #. #~x + est{&beta.} ( #~x #. #~x - $~x #~e #. #~x)

est{&beta.} _ = _ fract{ #~y #. #~x - $~y #~e #. #~x , #~x #. #~x - $~x #~e #. #~x} _ = _ fract{( #~y - $~y #~e ) #. #~x, ( #~x - $~x #~e ) #. #~x }

_ _ _ _ _ = _ fract{&sum._~i ( ~y_~i - $~y ) ~x_~i,&sum._~i ( ~x_~i - $~x ) ~x_~i}

Note that &sum._~i ( ~x_~i - $~x ) ~x_~i = &sum._~i ( ~x_~i - $~x ) ( ~x_~i - $~x ) _ and _ &sum._~i ( ~y_~i - $~y ) ~x_~i = &sum._~i ( ~y_~i - $~y ) ( ~x_~i - $~x )

[ Since _ &sum._~i ( ~x_~i - $~x ) = 0 _ and _ &sum._~i ( ~y_~i - $~y ) = 0, so multiplying by constant _ $~x _ has no effect ]

Put S_{~x~x} _ = _ &sum._~i ( ~x_~i - $~x )^2 , _ and _ S_{~y~x} _ = _ &sum._~i ( ~y_~i - $~y ) ( ~x_~i - $~x ) , _ and _ S_{~y~y} _ = _ &sum._~i ( ~y_~i - $~y )^2. _ Then we have:

est{&beta.} _ = _ ~S_{~x~y} ./ ~S_{~x~x}

est{&alpha.} _ = _ $~y - est{&beta.} $~x

Sums of Squares

The quantities

_ _ _ _ _ _ _ ~S_{~x~x} _ #:= _ &sum. ( ~x_~i - $~x ) ^2 _ = _ &sum. ~x_~i ^2 - ( &sum. ~x_~i ) ^2 ./ ~n

_ _ _ _ _ _ _ ~S_{~y~y} _ #:= _ &sum. ( ~y_~i - $~y ) ^2 _ = _ &sum. ~y_~i ^2 - ( &sum. ~y_~i ) ^2 ./ ~n

_ _ _ _ _ _ _ ~S_{~x~y} _ #:= _ &sum. ( ~x_~i - $~x ) ( ~y_~i - $~y ) _ = _ &sum. ~x_~i ~y_~i - ( &sum. ~x_~i ) ( &sum. ~y_~i ) ./ ~n

are loosely known as "~#{sums of squares}", although they are also sometimes, more appropriately, referred to as "#~{sums of squares of deviations}" and "#~{sums of squares of differences}", and in the case of the third quantity as the "#~{sum of the product of deviations}"

The term "~{sum of squares}" is not well defined in statistics, and is used for a whole plethora of expressions of the form "~{the sum, over a certain range, of squares of the difference of an observation and an estimated value}".

In linear normal models, we also use the term to designate the residual sum of squares of a model, and the associated total and explained sum of squares when talking about model reduction.

Estimation of Standard Deviation

Now that we have calculated est{&alpha.} and est{&beta.}, we define the #~{residual} of an observation as

_ _ _ _ _ _ _ &rho._~i _ = _ ~y_~i - est{&alpha.} - est{&beta.} ~x_~i

Note that a residual is an ~{observable quantity}, i.e. it is the distance from the observation to the estimated regression line ( est{&alpha.} + est{&beta.}~x ), and not the distance to the hypothetical (unknown) line ( &alpha. + &beta.~x ).

The #~{residual sum of squares} is defined as

_ _ _ _ _ _ _ RSS _ = _ || #~y - p_1 ( #~y ) ||^2 _ = _ || #~y - ( $~y - est{&beta.} $~x ) #~e - est{&beta.} #~x ||^2

_ _ _ _ _ _ _ _ _ _ = _ || ( #~y - $~y #~e ) - est{&beta.} ( #~x - $~x #~e ) ||^2

_ _ _ _ _ _ _ _ _ _ = _ &sum._~i ( ~y_~i - $~y )^2 + est{&beta.}^2 &sum._~i ( ~x_~i - $~x )^2 &minus. 2 est{&beta.}&sum._~i(~y_~i - $~y)(~x_~i - $~x)

_ _ _ _ _ _ _ _ _ _ = _ S_{~y~y} + est{&beta.}^2 S_{~x~x} - 2 est{&beta.} S_{~y~x}

_ _ _ _ _ _ _ _ _ _ = _ S_{~y~y} + est{&beta.} ( S_{~y~x} ./ S_{~x~x} ) S_{~x~x} - 2 est{&beta.} S_{~y~x}

this has ~n - 2 degrees of freedom, so

RSS _ = _ S_{~y~y} - est{&beta.} S_{~y~x}

~s_1^2 _ = _ RSS ./ ~n - 2

Distribution of Estimators

We have

est{&beta.} _ = _ fract{S_{~y~x},S_{~x~x}} _ = _ fract{&sum._~i ( ~y_~i - $~y ) ( ~x_~i - $~x ),S_{~x~x}}

Put

~a_~i _ = _ ( ~x_~i - $~x ) ./ S_{~x~x} , _ _ _ _ [ note: _ &sum._~i ~a_~i = 0 ]

then

est{&beta.} _ = _ &sum._~i ~a_~i ( ~y_~i - $~y ) _ = _ &sum._~i ~a_~i ~y_~i

So est{&beta.} is the linear sum of ~n normally distributed variables and is therefore itself normally distributed.

E ( est{&beta.} ) _ = _ &sum._~i ~a_~i E ( ~y_~i ) _ = _ &sum._~i ~a_~i ( &alpha. + &beta. ~x_~i ) _ = _ &beta. &sum._~i ~a_~i ~x_~i

but

&sum._~i ~a_~i ~x_~i _ = _ fract{&sum._~i ( ~x_~i - $~x ) ~x_~i,S_{~x~x}} _ = _ fract{&sum._~i ( ~x_~i - $~x ) ( ~x_~i - $~x ),S_{~x~x}} _ = _ fract{S_{~x~x},S_{~x~x}}

E ( est{&beta.} ) _ = _ &beta. .

V ( est{&beta.} ) _ = _ &sum._~i ~a_~i^2 V ( ~y_~i ) _ = _ &sigma.^2 &sum._~i ~a_~i^2 _ = _ &sigma.^2 &sum._~i ( ~x_~i - $~x )^2 ./ S_{~x~x}^2 _ = _ &sigma.^2 ./ S_{~x~x}

est{&beta.} _ ~ _ N ( &beta. , &sigma.^2 ./ ~S_{~x~x} )

Hypothesis: Null Gradient

Suppose that there is no relationship between Y and X, i.e. H_1: &beta. = 0, in other words Y &tilde. N(&alpha.#~e, &sigma.^2) which is just the model discussed in "uniform mean". in which case ESS = &sum._~i (~y_~i - $~y)^2 = S_{~y~y}, with ~n - 1 degrees of freedom. This is the total sum of squares (TSS) for the ANOVA table.

So we have

_ _ _ _ _ _ _ TSS _ = _ ~S_{~y~y}

_ _ _ _ _ _ _ RSS _ = _ ~S_{~y~y} - est{&beta.} ~S_{~x~y} , _ _ where _ est{&beta.} _ = _ ~S_{~x~y} ./ ~S_{~x~x}

_ _ _ _ _ _ _ ESS _ = _ TSS - RSS _ = _ ~S_{~x~y}^2 ./ ~S_{~x~x}

So we can complete the ANOVA table using the "sums of squares":

	Sum of Squares	d.f.	Mean Square	~r
Explained	~S_{~x~y}^2 ./ ~S_{~x~x}	1	ESS ./ 1	MS_E ./ MS_R
Residual	~S_{~y~y} - ~S_{~x~y}^2 ./ ~S_{~x~x}	~n - 2	RSS ./ ~n - 2
Total	~S_{~y~y}	~n - 1

The significance probability for the hypothesis _ H: &beta. = 0 , _ is 1 - F ( ~r ), where ~r is the mean square ratio, MS_E ./ MS_R, and F is the distribution for the F ( 1 , ~n - 2 ) distribution.

Sampling Distribution of Estimates

_ _ _ _ _ _ _ est{&beta.} ~ N ( &beta. , &sigma.^2 ./ ~S_{~x~x} ) _ => _ ( est{&beta.} - &beta. ) &sqrt.$~S_{~x~x} ~ N ( 0 , &sigma.^2 )

_ _ _ _ _ _ _ RSS ./ &sigma.^2 ~ &chi.^2 ( ~n - 2 ) _ => _ RSS ./ ( ~n - 2 ) ~ &sigma.^2 &chi.^2 ( ~n - 2 ) ./ ( ~n - 2 )

The quantities _ ( est{&beta.} - &beta. ) &sqrt.$~S_{~x~x} _ and _ RSS ./ ( ~n - 2 ) _ are independent, so

fract{( est{&beta.} - &beta. ) &sqrt.$~S_{~x~x},&sqrt.${ RSS ./ ( ~n - 2 ) }} _ ~ _ t^2 ( ~n - 2 )

[ Note: _ &sqrt.${ RSS ./ ( ~n - 2 ) ~S}_{~x~x} _ is the estimated standard deviation for est{&beta.} ]

So to test the hypothesis _ H: &beta. = 0 , _ we can test the quantity

fract{ est{&beta.},&sqrt.${ RSS ./ ( ~n - 2 ) ~S}_{~x~x}}

in a t^2 ( ~n - 2 ) distribution.