Page Contents

Contingency tables are used to display data which contains counts of observations divided by category. See Data with no variables to see how to define this data.

The MathymaStat tool to draw contingency tables is the athyma.stats.DataBlock method "Tabulate".

Let's look at a (constructed) example of 456 British people categorised by blood type (A/B/O) and nation (England/Scotland/Wales).

< script type="text/javascript" src="../Lib/MathymaStat.js" >< /script > . . . < script > var myData = new mathyma.stats.DataBlock("StatsData/blood.xml"); myData.ViewXML(); < /script >

[**Note**: The data used here has been 'aggregated' to count the number in each category. We could just as well have used 'atomic' data, i.e. an observation for each individual, for example:

. . < OBS > < Name >Jane Jones< /Name > < Blood >A< /Blood > < Nation >Wales< /Nation > < /OBS > < OBS > < Name >Sam Smith< /Name > < Blood >O< /Blood > < Nation >England< /Nation > < /OBS > < OBS > < Name >Mary McGregor< /Name > < Blood >O< /Blood > < Nation >Scotland< /Nation > < /OBS > . .

MathymaStats will deal with either case. In terms of coding the only difference is that for the atomic data, there is no 'count' variable to defne.]

If you have a look at the XML data you'll see that there are three tags: < Blood >, < Nation > and < Count >. Normally you'd have to define Blood and Nation as factors, but the Tabulate method defines all the factors it needs, based on the parameters you give it. What you do have to identify is the count or frequency variable, in this case "Count".

So suppose we want to make a table for the above data of blood groups by nation. Here are the lines needed to do that, and the resulting table:

< script > myData.DefineCount("Count"); myData.Tabulate("Blood/Nation"); < /script >

If you just want to see the distribution by blood group or by nation then just stipulate the one factor:

< script > myData.Tabulate("Blood"); myData.Tabulate("Nation"); < /script >

One of the commenest tests on count data of this sort, is to see whether the rows and columns are independent. This can be used to test hypotheses about the independence of two factors (categories) or to compare several multinomial distributions.

For example in the above case, we could say that we've chosen 456 British people at random then asked for their nationality and tested for their blood group. The model here is of a multinomial distribution with nine outcomes. We can then test to see if blood group and nationality are independent of each other.

Alternatively we might have found 240 English people, 23 Welsh, and 193 Scottish, and in each case tested for their blood group. In this case what we have is three multinomial distributions each with (the same) three outcomes (blood group). We can test to see if the three distributions are the same.

Not surprisingly, it turns out that the test to be used is mathematically identical in the two cases, and is based on the chi-squared value. First consider what we'd expect the counts in each cell of the table to be if we only knew the marginal distributions (i.e. the row and column totals). For any cell this would be:

Expected value (~{exp}) _ = _ fract{row total # column total, grand total}

The chi-squared value is then defined as

~X^2 _ = _ sum{fract{( ~{obs} - ~{exp} )^2,~{exp}},,}

where *obs* is the observed value and *exp* is the expected value.

This has a chi-squared distribution with ( ~r - 1 )( ~c - 1 ) degrees of freedom, where ~r is the number of rows, and ~c is the number of coulmns.

The Tabulate method will do a combined tabulation and test for independence by putting the second parameter to "true". [Note that this needs MathymaMatrix & MathymaDistr]:

< script type="text/javascript" src="../Lib/MathymaStat.js" >< /script > < script type="text/javascript" src="../Lib/MathymaMatrix.js" >< /script > < script type="text/javascript" src="../Lib/MathymaDistr.js" >< /script > . . . < script > myData.Tabulate("Blood/Nation",true); < /script >