A trimmed mean with a moderate trimming percentage—someplace between 5% and 25%—will yield a measure of center that is neither as sensitive to outliers as is the mean nor as insensitive as the median. If the desired trimming percentage is and is not an integer, the trimmed mean must be calculated by interpolation. For example, consider for a 10% trimming percentage and as in Example 1.16. Then would be the appropriate weighted average of the 7.7% trimmed mean calculated there and the 11.5% trimmed mean resulting from trimming three observations from each end.
Categorical Data and Sample Proportions When the data is categorical, a frequency distribution or relative frequency dis- tribution provides an effective tabular summary of the data. The natural numer- ical summary quantities in this situation are the individual frequencies and the relative frequencies. For example, if a survey of individuals who own digital cameras is undertaken to study brand preference, then each individual in the sample would identify the brand of camera that he or she owned, from which we could count the number owning Canon, Sony, Kodak, and so on. Consider sam- pling a dichotomous population—one that consists of only two categories (such as voted or did not vote in the last election, does or does not own a digital cam- era, etc.). If we let x denote the number in the sample falling in category 1, then the number in category 2 is . The relative frequency or sample proportion in category 1 is x/n and the sample proportion in category 2 is . Let’s denote a response that falls in category 1 by a 1 and a response that falls in cat- egory 2 by a 0. A sample size of might then yield the responses 1, 1, 0, 1, 1, 1, 0, 0, 1, 1. The sample mean for this numerical sample is (since number of )
More generally, focus attention on a particular category and code the sample results so that a 1 is recorded for an observation in the category and a 0 for an observation not in the category. Then the sample proportion of observations in the category is the sample mean of the sequence of 1s and 0s. Thus a sample mean can be used to summarize the results of a categorical sample. These remarks also apply to situations in which categories are defined by grouping values in a numerical sam- ple or population (e.g., we might be interested in knowing whether individuals have owned their present automobile for at least 5 years, rather than studying the exact length of ownership).
Analogous to the sample proportion x/n of individuals or objects falling in a particular category, let p represent the proportion of those in the entire population falling in the category. As with x/n, p is a quantity between 0 and 1, and while x/n is a sample characteristic, p is a characteristic of the population. The relationship between the two parallels the relationship between and and between and . In particular, we will subsequently use x/n to make inferences about p. If, for example, a sample of 100 car owners reveals that 22 owned their car at least 5 years, then we might use as a point estimate of the proportion of all owners who have owned their car at least 5 years. With k categories , we can use the k sample proportions to answer questions about the population pro- portions