The Chi-Squared Test of Independenceis used to test the relationship between two categorical variables. We assume that each member of the population falls into one and only one category of each of the variables. A contingency table is formed by putting one variable down the rows and the other across the columns, then filling in the number of instances in the sample where a category of both the row variable and the column variable are satisfied. Performing the Chi-Squared Test of Independence on this data determines whether there is a significant difference between the data and the values that would be expected.
Example: A survey was conducted to determine if color preference was related to, or independent of, gender. There were 120 males and 100 females surveyed, and the observed results are given in the cross tabulation below.
The expected frequency for a given category, such as males who prefer red, can be computed by multiplying the total number of males (120) by the percentage of people who prefer red, which is calculated by the total of the red preference column divided by the total number of people surveyed; in this case 91/220. An equivalent method is to multiply the corresponding row and column totals and divide by the grand total. With either method, the expected value for males who prefer red is about 49.63. You will not need to compute all these values by hand, however. The expected values, and the chi-squared value itself, will be found using an applet whose link is below.
The general hypotheses for a Chi-Squared Test of Independence are these: The null hypothesis is that the two variables are independent of one another. The alternate hypothesis is that the two variables are related to each other. Thus, for this example, our hypotheses are as follows:
HO: A person's gender and favorite color are unrelated.
The Chi-Squared Test of Independence is performed in three main steps.
First, form the null and alternate hypotheses and select an alpha level. The alpha level is the researcher's allowance for a type I error, which is mistakenly rejecting the null hypothesis. The value used at CVGS is typically alpha = 0.05.
Note that the Chi-Squared Test of Independence calculator gives you the chi-squared value, as well as a quantity called the degrees of freedom, or df. The df is given as df = (r-1)(c-1), where r is the number of rows in your contingency table, and c is the number of columns. The df of the table above is df = (3-1)(2-1) = (2)(1) = 2.
Finally, with your df and your alpha value, you can consult a table of critical values for the chi-squared statistic to test the null hypothesis (Chi-square table of critical values). In the table, find the chi-squared critical value that corresponds to the distribution for df = 2 and the alpha value of 0.05. The critical value is 5.991. Compare this to your calculated chi-squared statistic, in this example, about 6.88. Since the statistic exceeds the critical value, it is in the rejection area of the distribution, and we therefore reject the null hypothesis.
Another, related way to test the null hypothesis is to use the CHISQ.DIST.RT function in Excel. Entering the chi-squared value of 6.88 and df = 2, a p-value of about 0.032 is returned. Recall that the p-value is the probability of rejecting a correct null hypothesis; it is the probability that the results of the test are due to random chance or error. Since this value is smaller than the alpha (set at 0.05), the null hypothesis is rejected.
The Health, Cops, and Education exercises in the inferential statistics activities menu use a Chi-Squared Test of Independence to investigate the relationship between two variables.
Original work on this document was done by Central Virginia Governor's School students Richard Barnes, Kim Tibbs, and Ryan Nash (Class of '00). This document was updated by Central Virginia Governor's School students Matthew James and Kyle Nenninger (Class of '03).
Copyright © 1999 Central Virginia Governor's School, Lynchburg, VA