Sociology 393 Lectures

back to soc 393 home page

 Box Plots

Box plots are diagnostic graphics that use the median and IQR to describe distributions.  They are difficult to grasp, but once you are used to them they are the easiest way to identify outliers.  Box plots should not be used to convey information to unsophisticated audiences.

Suppose I had a sample of 15 houses and apartment and wanted to know something about how many rooms were contained in each domicile.  I record this information for each domicile and get answers of 1, 1, 1, 3, 4, 5, 5, 5, 5, 7, 8, 9, 9, 10, 28.  This information can be illustrated as a box plot.


The line in the middle of the red box is the median.  The median happens to be 5 in this case.  The bottom of the box is the first quartile (Q1).  In this case Q1=3.  The top of the box is the third quartile (Q3).  In this case Q3=9.  The box therefore is 6 units high (Q3-Q1).  The IQR is 6.  The whiskers of the box go to the highest and lowest case that is not an outlier.  In this case, the lowest case that is not an outlier is 1.  The highest case that is not an outlier is 10.  The 28 room place is considered an outlier.  It is considered an outlier because it is more than 1.5 IQR's away from the box.  Outliers, in this case, would be more than 9 points below Q1 or more than 9 points above Q3.  In other words, domiciles that had fewer than -8 rooms (an impossibility) or more than 18 rooms would be outliers.  We generally like to distinguish between mild and severe outliers.  Mild outliers may cause a bit of skew and may make the mean unusable if there are enough of them.  Severe outliers, by virtue of being more extreme, are more likely to have a dramatic impact on the mean and to cause severe skew.  Mild outliers are usually defined as being between 1.5 and 3 IQR's away from the box.  Severe outliers are more than 3 IQR's away from the box.  Please note that these are somewhat arbitrary distinctions.  There are a number of different ways of defining outliers so not everything you read will correspond to the 1.5 and 3 IQR definition.  The general idea of outliers being extreme values, however, will always be true.

SPSS instructions

 

Contact Webmaster

Last Updated: January 19, 2005