As an extended example of the use of matrices in statistical operations, we will consider the problem of checking for Hardy-Weinberg equilibrium.
Suppose that in a species there are two alleles (types of gene) at one particular locus (position) of the DNA. These are denoted A and a. As each individual has a pair of genes at each locus, an individual can be one of the three genotypes: AA, Aa, or aa.
A population is said to be in Hardy-Weinberg equilibrium if the genotypes are distributed in the ratio:
AA #: Aa #: aa _ _ = _ _ ~p^2 #: 2 ~p (1 - ~p) #: (1 - ~p)^2 _ _ _ some 0 < ~p < 1
Suppose we measure ~n individuals from the population, so that there are ~x_1 individuals of type AA, ~x_2 of type Aa, and ~x_3 of type aa.
We wish to test for equilibrium. First, without going into the theory, we find the estimate for ~p
est{~p} _ _ = _ _ (2~x_1 + ~x_2) / 2~n
From this we can calcuate the expected number for each genotype, if the hypothesis of Hardy-Weinberg equilibrium is true, and hence do the chi-squared test in the usual way.
Let's look at a numerical example. A number of nematode worms have been tested, and the genotypes AA, Aa, and aa found in numbers 27, 30, and 12 respectively. First note that if
est{~p} _ _ = _ _ (2~x_1 + ~x_2) / 2~n _ _ then _ _ 1 - est{~p} _ _ = _ _ (2~x_3 + ~x_2) / 2~n
~nest{~p} _ _ = _ _ ~x_1 + ~x_2/2 ,_ _ _ and _ _ ~n - ~nest{~p} _ _ = _ _ ~x_3 + ~x_2/2
First we define the three column vectors:
< script > var wObs = new mathyma.Matrix ("27/30/12"); ...
Then we can calculate the estimates, and the sum:
< script > var wA = new mathyma.Matrix ("1/0.5/0"); var wB = new mathyma.Matrix ("0/0.5/1"); var nEstP = wObs.Trans().Mult(wA).Det(); var nEstQ = wObs.Trans().Mult(wB).Det(); var ObsTot = nEstP + nEstQ; document.write("~nest{~p} _ _ = _ _ " + mathyma.formatDP(nEstP,1) + "< br / > "); document.write("~n (1 - est{~p}) _ _ = _ _ " + mathyma.formatDP(nEstQ,1) + "< br / > "); document.write("~n _ _ = _ _ " + mathyma.formatDP(ObsTot,1) + "< br / > "); < /script >
Now we can form the column vector of expected values:
< script > var wExpArr = new Array(); wExpArr[0] = new Array(); wExpArr[0][0] = nEstP * nEstP / ObsTot; wExpArr[1] = new Array(); wExpArr[1][0] = 2 * nEstP * nEstQ / ObsTot; wExpArr[2] = new Array(); wExpArr[2][0] = nEstQ * nEstQ / ObsTot; wExp = new mathyma.Matrix(wExpArr); document.write("< p > Exp _ _ = _ _ " + wExp.McPrint() + "< /p > "); < /script >
The chi-square value for a multinomial distribution is defined as:
X^2 _ _ = _ _ sum{fract{(~{obs} - ~{exp})^2,~{exp}},,}
In (column-) vector terms, if #x is the vector of observations, and #e is the vector of expected values, then
X^2 _ _ = _ _ ((#x - #e) [&div.] #e) #. (#x - #e)
where [&div.] is the division by element of matrices.
So we can now calculate the chi-square value, which has a chi-square distribution with one degree of freedom:
< script > var wDiff = wObs.Minus(wExp); var ChiSqVal = 1 - wDiff.Trans().Mult(wDiff.ElementDiv(wExp)).Det(); document.write("ChiSqVal _ _ = _ _ " + mathyma.formatDP(ChiSqVal,4)); var wSP = ChiSqDist(ChiSqVal,1); document.write(" Sig. Prob. _ _ = _ _ " + mathyma.formatDP(wSP,4) ); < /script >
The significance probability of just over 46% is quite high, tending to suggest that we can accept the hypothesis of Hardy-Weinberg equilibrium. [Or in proper Stats speak: the significance probability is too high to reject the hypothesis at a 5% level.]