Measures of Location 31
60 65 70 Duration
Figure 1.16 Dotplot of the data from Example 1.14
12 recordings of Beethoven’s Symphony #9 (the “Choral,” a stunningly beautiful work), yielding the following durations (min) listed in increasing order:
62.3 62.8 63.6 65.2 65.7 66.4 67.4 68.4 68.8 70.8 75.7 79.0
Here is a dotplot of the data:
Since is even, the sample median is the average of the values from the ordered list:
Note that if the largest observation 79.0 had not been included in the sample, the resulting sample median for the remaining observations would have been the single middle value 66.4 (the ordered value, i.e. the 6th value in from either end of the ordered list). The sample mean is , a bit more than a full minute larger than the median. The mean is pulled out a bit rela- tive to the median because the sample “stretches out” somewhat more on the upper end than on the lower end. ■
The data in Example 1.15 illustrates an important property of in contrast to : The sample median is very insensitive to outliers. If, for example, we increased
the two largest xis from 75.7 and 79.0 to 85.7 and 89.0, respectively, would be unaffected. Thus, in the treatment of outlying data values, and are at opposite ends of a spectrum. Both quantities describe where the data is centered, but they will not in general be equal because they focus on different aspects of the sample.
Analogous to as the middle value in the sample is a middle value in the pop- ulation, the population median, denoted by . As with and , we can think of using the sample median to make an inference about . In Example 1.15, we might use as an estimate of the median time for the population of all record- ings. A median is often used to describe income or salary data (because it is not greatly influenced by a few large salaries). If the median salary for a sample of engi- neers were , we might use this as a basis for concluding that the median salary for all engineers exceeds $60,000.
The population mean and median will not generally be identical. If the population distribution is positively or negatively skewed, as pictured in Figure 1.17, then . When this is the case, in making inferences we must first decide which of the two population characteristics is of greater interest and then proceed accordingly.