Before you make important conclusions about datasets, it is often necessary to observe and analyze the distribution of the data. Reviewing the data's distribution can sometimes result in finding errors in measurement or finding outlying data points that show some sort of unique characteristic. One type of distribution analysis is the boxplot, which specifically reveals information about how the data is spread out.
In order to construct a boxplot, you must first determine the minimum, maximum, and median values. You must also obtain the values of the two hinges. After arranging the data in increasing order, you must find the left hinge by finding the median of the bottom half of the data. The right hinge is the median of the top half of the data. The minimum and the maximum points are called "whiskers."
On the boxplot shown above, the left whisker is at point A, the left hinge is at point B, the median of the data is at point C, the right hinge is at point D, and the right whisker is at point E.
If the data is evenly distributed, indicating that there are no "errors" or outliers in the data, then approximately one-fourth of the values should fall between the minimum and the left hinge, one-half of the values should lie in the "box" between the two hinges, and one-fourth of the values should fall between the right hinge and the maximum. Below is an example of a "perfect" boxplot which shows that the data is evenly distributed.
An uneven spread is in strong contrast to an even spread, as seen below. A boxplot like this would indicate that one of the data points, most likely the maximum, is an outlier.
Note: This DIG Stats activity uses the TI-83 graphing calculator to create boxplots.
Original work on this document was done by Central Virginia Governor's School students Jared Edgar and Marie King (Class of '99).
Copyright © 1998 Central Virginia Governor's School for Science and Technology Lynchburg, VA