Chi-Squared Tests

Chi-Squared for Goodness of Fit

The Chi-Squared Test for Goodness of Fit compares a set of observed frequencies to values that are expected either by theoretical probability or prior research. The frequencies are separated into mutually exclusive possibilities for one categorical variable. The Greek letter chi, (pronounced "kye") looks similar to a script "X." When written with an exponent of two, this symbol represents the chi-squared statistic. This statistic represents the degree to which the observed frequencies in a data set differ from expected frequencies.

Here is a contrived example of an experiment appropriate for a Chi-Squared Goodness of Fit Test.

A botanist was working with plants that were either red, pink, or white. Her hypothesis was that if a plant expressed a red color, it had two alleles that were the same (RR). She also hypothesized that if a plant was white, it had two alleles that were the same (rr). However, if a plant was pink, she hypothesized that it had two alleles that were different (Rr). She decided to cross two pink plants to see if the observed results would be similar to her expected results of 25% red (RR), 50% pink (Rr), and 25% white (rr). She set her alpha at 0.05. The expected and observed results from her cross, which resulted in exactly 100 plants, are given below.

Red Pink White
Observed 35 45 20
Expected 25 50 25

Note: the situation described and data above are contrived.

The null hypothesis was that there would be no difference between the observed data and the expected values. The alternative hypothesis is that there would be a difference.

HO: Given 100 plants, 25 will be red, 50 will be pink, and 25 will be white.
HA: At least one of the observed values will be different than expected/predicted.

To complete the test, the chi-squared statistic value and the degrees of freedom must be calculated.

The chi-squared value is a single value that is the sum of the ratios of the squares of the differences between what is observed and what was expected, divided by what was expected, for each category in the contingency table.

The degrees of freedom is the number of categories minus one.

In order to ease this fairly formidable process, DIG Stats provides the use of a Chi-Squared Goodness of Fit Calculator which can be run in most recent web browsers. Click the button below to see if your browser can run the applet.

With our chi-squared value and degrees of freedom (df), we can test the null hypothesis by referring to a Chi-Squared Table. There is a different chi-squared distribution for each value of the degrees of freedom, and the alpha level set determines where on the distribution the critical value lies, so use both to find the critical chi-squared value. For our example, df = 2 (one less than the number of groups) and alpha = 0.05 gives us a critical chi-squared value of 5.991. The DIG Stats calculator calculated a chi-squared value of 5.5. Since our statistic does not exceed the critical value, it is not in the rejection area of the distribution, and we therefore retain the null hypothesis.

Another, related way to test the null hypothesis is to use the CHISQ.DIST.RT function in Excel. Entering the chi-squared value of 5.5 and df = 2, a p-value of about 0.064 is returned. This value is larger than the alpha (set at 0.05), and thus the null hypothesis is retain; the data is not sufficient to accept the alternative hypothesis.

The Cellular Phone and Dice activities in the inferential activities menu use Chi-Squared Goodness of Fit tests.

Original work on this document was done by Central Virginia Governor's School students Richard Barnes, Kim Tibbs, and Ryan Nash (Class of '00). This document was updated by Central Virginia Governor's School students Matthew James and Kyle Nenninger (Class of '03).

Copyright © 1999 Central Virginia Governor's School, Lynchburg, VA