Consider the situation where there are two quantities or variables, ~x and ~y say, associated with each observation, and that the value of ~y is determined by the value of ~x, in the sense that ~y is the value of a random variable ~Y, where ~Y &tilde. N ( &mu., &sigma.^2 ) , _ &mu. = &alpha. + &beta. ~x, &alpha. &in. &reals., &beta. &in. &reals..
So if we knew the values of &alpha. and &beta., then once we knew the value of ~x, we would know the distribution of ~Y. The variable ~X is known as the ~{independent} or ~{explanatory} variable, while ~Y is called the ~{dependent} or ~{response} variable. When the dependency is of the form _ &mu. = &alpha. + &beta. ~x , _ we say that ~Y is ~{linearly dependent} on ~X.
So we will be looking at models of the form: ~n observations, each with two values ~x_~i and ~y_~i, where ~y_1, ... , ~y_{~n} are values of the independent random variables ~Y_1, ... , ~Y_{~n} , _ and _ ~Y_~i &tilde. N ( &alpha. + &beta. ~x_~i, &sigma.^2 )
This is just saying that the conditional distribution of ~Y given ~x is: _ _ ( ~Y | ~x ) &tilde. N ( &alpha. + &beta. ~x, &sigma.^2 ).
This is called a ~{linear regression} model.
Another, equivalent way of expressing the model is
_ _ _ _ _ _ _ ~Y_~i = &alpha. + &beta. ~x_~i + &epsilon._~i, _ _ where _ _ &epsilon._~i &tilde. N( 0, &sigma.^2 ) _ _ &forall. ~i
So #{&mu.} &in. L, where L is spanned by _ #~e = ( 1, ... , 1 ) , _ and _ #~x = ( ~x_1, ... , ~x_{~n} ) , _ i.e. #{&mu.} = &alpha. #~e + &beta. #~x, _ &alpha., &beta. &in. &reals. , _ dim L = 2.
We have:
p_1( #~y ) _ = _ est{&alpha.} #~e + est{&beta.} #~x
and
_ #~y #. #~e _ = _ p_1( #~y ) #. #~e _ = _ est{&alpha.} #~e #. #~e + est{&beta.} #~x #. #~e
i.e.
&sum._~i ~y_~i _ = _ est{&alpha.} ~n + est{&beta.} &sum._~i ~x_~i
giving
est{&alpha.} _ = _ _ $~y - est{&beta.} $~x
where _ _ $~y _ = _ ( &sum._~i ~y_~i ) ./ ~n , _ and _ $~x _ = _ ( &sum._~i ~x_~i ) ./ ~n
Also
_ #~y #. #~x _ = _ p_1 ( #~y ) #. #~x _ = _ est{&alpha.} #~e #. #~x + est{&beta.} #~x #. #~x
_
_ _ _ _ _ = _ ( $~y - est{&beta.} $~x ) #~e #. #~x + est{&beta.} #~x #. #~x
_
_ _ _ _ _ = _ _ $~y #~e #. #~x + est{&beta.} ( #~x #. #~x - $~x #~e #. #~x)
_
est{&beta.} _ = _ fract{ #~y #. #~x - $~y #~e #. #~x , #~x #. #~x - $~x #~e #. #~x} _ = _ fract{( #~y - $~y #~e ) #. #~x, ( #~x - $~x #~e ) #. #~x }
_
_ _ _ _ _ = _ fract{&sum._~i ( ~y_~i - $~y ) ~x_~i,&sum._~i ( ~x_~i - $~x ) ~x_~i}
Note that &sum._~i ( ~x_~i - $~x ) ~x_~i = &sum._~i ( ~x_~i - $~x ) ( ~x_~i - $~x ) _ and _ &sum._~i ( ~y_~i - $~y ) ~x_~i = &sum._~i ( ~y_~i - $~y ) ( ~x_~i - $~x )
[ Since _ &sum._~i ( ~x_~i - $~x ) = 0 _ and _ &sum._~i ( ~y_~i - $~y ) = 0, so multiplying by constant _ $~x _ has no effect ]
Put S_{~x~x} _ = _ &sum._~i ( ~x_~i - $~x )^2 , _ and _ S_{~y~x} _ = _ &sum._~i ( ~y_~i - $~y ) ( ~x_~i - $~x ) , _ and _ S_{~y~y} _ = _ &sum._~i ( ~y_~i - $~y )^2. _ Then we have:
est{&beta.} _ = _ ~S_{~x~y} ./ ~S_{~x~x} est{&alpha.} _ = _ $~y - est{&beta.} $~x |
The quantities
_ _ _ _ _ _ _ ~S_{~x~x} _ #:= _ &sum. ( ~x_~i - $~x ) ^2 _ = _ &sum. ~x_~i ^2 - ( &sum. ~x_~i ) ^2 ./ ~n
_ _ _ _ _ _ _ ~S_{~y~y} _ #:= _ &sum. ( ~y_~i - $~y ) ^2 _ = _ &sum. ~y_~i ^2 - ( &sum. ~y_~i ) ^2 ./ ~n
_ _ _ _ _ _ _ ~S_{~x~y} _ #:= _ &sum. ( ~x_~i - $~x ) ( ~y_~i - $~y ) _ = _ &sum. ~x_~i ~y_~i - ( &sum. ~x_~i ) ( &sum. ~y_~i ) ./ ~n
are loosely known as "~#{sums of squares}", although they are also sometimes, more appropriately, referred to as "#~{sums of squares of deviations}" and "#~{sums of squares of differences}", and in the case of the third quantity as the "#~{sum of the product of deviations}"
The term "~{sum of squares}" is not well defined in statistics, and is used for a whole plethora of expressions of the form "~{the sum, over a certain range, of squares of the difference of an observation and an estimated value}".
In linear normal models, we also use the term to designate the residual sum of squares of a model, and the associated total and explained sum of squares when talking about model reduction.
Now that we have calculated est{&alpha.} and est{&beta.}, we define the #~{residual} of an observation as
_ _ _ _ _ _ _ &rho._~i _ = _ ~y_~i - est{&alpha.} - est{&beta.} ~x_~i
Note that a residual is an ~{observable quantity}, i.e. it is the distance from the observation to the estimated regression line ( est{&alpha.} + est{&beta.}~x ), and not the distance to the hypothetical (unknown) line ( &alpha. + &beta.~x ).
The #~{residual sum of squares} is defined as
_ _ _ _ _ _ _ RSS _ = _ || #~y - p_1 ( #~y ) ||^2 _ = _ || #~y - ( $~y - est{&beta.} $~x ) #~e - est{&beta.} #~x ||^2
_ _ _ _ _ _ _ _ _ _ = _ || ( #~y - $~y #~e ) - est{&beta.} ( #~x - $~x #~e ) ||^2
_ _ _ _ _ _ _ _ _ _ = _ &sum._~i ( ~y_~i - $~y )^2 + est{&beta.}^2 &sum._~i ( ~x_~i - $~x )^2 &minus. 2 est{&beta.}&sum._~i(~y_~i - $~y)(~x_~i - $~x)
_ _ _ _ _ _ _ _ _ _ = _ S_{~y~y} + est{&beta.}^2 S_{~x~x} - 2 est{&beta.} S_{~y~x}
_ _ _ _ _ _ _ _ _ _ = _ S_{~y~y} + est{&beta.} ( S_{~y~x} ./ S_{~x~x} ) S_{~x~x} - 2 est{&beta.} S_{~y~x}
this has ~n - 2 degrees of freedom, so
RSS _ = _ S_{~y~y} - est{&beta.} S_{~y~x} ~s_1^2 _ = _ RSS ./ ~n - 2 |
We have
est{&beta.} _ = _ fract{S_{~y~x},S_{~x~x}} _ = _ fract{&sum._~i ( ~y_~i - $~y ) ( ~x_~i - $~x ),S_{~x~x}}
Put
~a_~i _ = _ ( ~x_~i - $~x ) ./ S_{~x~x} , _ _ _ _ [ note: _ &sum._~i ~a_~i = 0 ]
then
est{&beta.} _ = _ &sum._~i ~a_~i ( ~y_~i - $~y ) _ = _ &sum._~i ~a_~i ~y_~i
So est{&beta.} is the linear sum of ~n normally distributed variables and is therefore itself normally distributed.
E ( est{&beta.} ) _ = _ &sum._~i ~a_~i E ( ~y_~i ) _ = _ &sum._~i ~a_~i ( &alpha. + &beta. ~x_~i ) _ = _ &beta. &sum._~i ~a_~i ~x_~i
but
&sum._~i ~a_~i ~x_~i _ = _ fract{&sum._~i ( ~x_~i - $~x ) ~x_~i,S_{~x~x}} _ = _ fract{&sum._~i ( ~x_~i - $~x ) ( ~x_~i - $~x ),S_{~x~x}} _ = _ fract{S_{~x~x},S_{~x~x}}
so
E ( est{&beta.} ) _ = _ &beta. .
V ( est{&beta.} ) _ = _ &sum._~i ~a_~i^2 V ( ~y_~i ) _ = _ &sigma.^2 &sum._~i ~a_~i^2 _ = _ &sigma.^2 &sum._~i ( ~x_~i - $~x )^2 ./ S_{~x~x}^2 _ = _ &sigma.^2 ./ S_{~x~x}
est{&beta.} _ ~ _ N ( &beta. , &sigma.^2 ./ ~S_{~x~x} ) |
Suppose that there is no relationship between Y and X, i.e. H_1: &beta. = 0, in other words Y &tilde. N(&alpha.#~e, &sigma.^2) which is just the model discussed in "uniform mean". in which case ESS = &sum._~i (~y_~i - $~y)^2 = S_{~y~y}, with ~n - 1 degrees of freedom. This is the total sum of squares (TSS) for the ANOVA table.
So we have
_ _ _ _ _ _ _ TSS _ = _ ~S_{~y~y}
_ _ _ _ _ _ _ RSS _ = _ ~S_{~y~y} - est{&beta.} ~S_{~x~y} , _ _ where _ est{&beta.} _ = _ ~S_{~x~y} ./ ~S_{~x~x}
_ _ _ _ _ _ _ ESS _ = _ TSS - RSS _ = _ ~S_{~x~y}^2 ./ ~S_{~x~x}
So we can complete the ANOVA table using the "sums of squares":
Sum of Squares | d.f. | Mean Square | ~r | |
Explained | ~S_{~x~y}^2 ./ ~S_{~x~x} | 1 | ESS ./ 1 | MS_E ./ MS_R |
Residual | ~S_{~y~y} - ~S_{~x~y}^2 ./ ~S_{~x~x} | ~n - 2 | RSS ./ ~n - 2 | |
Total | ~S_{~y~y} | ~n - 1 |
The significance probability for the hypothesis _ H: &beta. = 0 , _ is 1 - F ( ~r ), where ~r is the mean square ratio, MS_E ./ MS_R, and F is the distribution for the F ( 1 , ~n - 2 ) distribution.
_ _ _ _ _ _ _ est{&beta.} ~ N ( &beta. , &sigma.^2 ./ ~S_{~x~x} ) _ => _ ( est{&beta.} - &beta. ) &sqrt.$~S_{~x~x} ~ N ( 0 , &sigma.^2 )
_ _ _ _ _ _ _ RSS ./ &sigma.^2 ~ &chi.^2 ( ~n - 2 ) _ => _ RSS ./ ( ~n - 2 ) ~ &sigma.^2 &chi.^2 ( ~n - 2 ) ./ ( ~n - 2 )
The quantities _ ( est{&beta.} - &beta. ) &sqrt.$~S_{~x~x} _ and _ RSS ./ ( ~n - 2 ) _ are independent, so
fract{( est{&beta.} - &beta. ) &sqrt.$~S_{~x~x},&sqrt.${ RSS ./ ( ~n - 2 ) }} _ ~ _ t^2 ( ~n - 2 )
[ Note: _ &sqrt.${ RSS ./ ( ~n - 2 ) ~S}_{~x~x} _ is the estimated standard deviation for est{&beta.} ]
So to test the hypothesis _ H: &beta. = 0 , _ we can test the quantity
fract{ est{&beta.},&sqrt.${ RSS ./ ( ~n - 2 ) ~S}_{~x~x}}
in a t^2 ( ~n - 2 ) distribution.