|
Box
Plots Box
plots are diagnostic graphics that use the median
and IQR to describe distributions. They are
difficult to grasp, but once you are used to them
they are the easiest way to identify outliers.
Box plots should not be used to convey information
to unsophisticated audiences.
Suppose I had a
sample of 15 houses and apartment and wanted to know
something about how many rooms were contained in
each domicile. I record this information for
each domicile and get answers of 1, 1, 1, 3, 4, 5,
5, 5, 5, 7, 8, 9, 9, 10, 28. This information
can be illustrated as a box plot.

The line in the middle of the red box is the
median. The median happens to be 5 in this
case. The bottom of the box is the first
quartile (Q1). In this case Q1=3. The
top of the box is the third quartile (Q3). In
this case Q3=9. The box therefore is 6 units
high (Q3-Q1). The IQR is 6. The whiskers
of the box go to the highest and lowest case that is
not an outlier. In this case, the lowest case
that is not an outlier is 1. The highest case
that is not an outlier is 10. The 28 room
place is considered an outlier. It is
considered an outlier because it is more than 1.5
IQR's away from the box. Outliers, in this
case, would be more than 9 points below Q1 or more
than 9 points above Q3. In other words,
domiciles that had fewer than -8 rooms (an
impossibility) or more than 18 rooms would be
outliers. We generally like to distinguish
between mild and severe outliers. Mild
outliers may cause a bit of skew and may make the
mean unusable if there are enough of them.
Severe outliers, by virtue of being more extreme,
are more likely to have a dramatic impact on the
mean and to cause severe skew. Mild outliers
are usually defined as being between 1.5 and 3 IQR's
away from the box. Severe outliers are more
than 3 IQR's away from the box. Please note
that these are somewhat arbitrary
distinctions. There are a number of different
ways of defining outliers so not everything you read
will correspond to the 1.5 and 3 IQR
definition. The general idea of outliers being
extreme values, however, will always be true.
SPSS
instructions
|