Two Sample t-test

A t-test is used to determine if there is a "statistical" difference between the means of two data sets. Why is this necessary? Can't you just subtract the two means to get the difference? You can certainly do that but the question is whether that difference is due to normal random variation or from some intervening factor. Take for example the following case.

Testing at the River
A researcher goes to a site on a river and wants to determine the dissolved oxygen (DO) content of the water. Using a DO meter, she takes 12 sample readings from one spot. She then takes another set of 12 readings from the same spot. She calculates the mean of the first and the second set of readings? Are the means the same? Although the means represent samples taken at the same spot on the river during the same time period, it is highly unlikely that the means will be the same! The difference in the means is due to normal random variations of the DO in the water.

Now, let's assume the researcher wants to test another site down river to see if an industrial complex is polluting the river and reducing its DO content. She collects 12 sample readings from the original site and calculates the mean of the data set. She then collects 12 sample readings at a site down river from the industrial complex and calculates the mean for the set? Is the difference between these two means due to normal random variations of the DO in the water or is it due to pollution from the industrial complex?

Figure 1 - T-Test

Group I
(Up-river Site)

Group II
(Down-river Site)

12 Data Points
(DO)

12 Data Points
(DO)

A t-test does a statistical analysis that looks at the size of the difference between the two means. If the river is not polluted, the sites above and below the industrial complex have the same water quality. Therefore, the difference in the sample means are due to normal random variations just like the difference in the sample means taken from the same spot. The samples are taken from the same "population" of DO values and the range of mean differences expected due to normal random variations in the "population" can be calculated statistically. On the other hand, if the river is being polluted, the samples taken down river from the industrial complex will represent a different "population" of DO values. The difference between the means of the data sets taken above and below the industrial complex represent the influence of an outside factor that altered the DO in the water. Hence, the mean difference between the two sites will be larger than the differences expected due to random normal variations in the original population.

Bottom Line
A t-test is used to determine if there is a "significant difference" between two group means. One rule of thumb states to have at least 12 data points in each group, although you may certainly have fewer, the test would just have less power. It is not necessary to have the same number of data points in the two groups. If there is a significant difference, the difference between the group means is not due to normal random variations in a population. The difference is so large that it indicates the groups come from different populations. In research, this generally means that the "original population" was changed due to some intervening factor.

Steps to perform a t-test


In the previous menu, the Coliform and Football activities use t-tests to analyze data sets. The activities assume that you have access to Excel, a TI-83 calculator or another software package capable of performing inferential tests.


Copyright © 1997 Central Virginia Governor's School for Science and Technology Lynchburg, VA