Solutions — Central Tendency Measures

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson DS-3

Section 5: Guided Practice Solutions

▾

Problem 1 — Step-by-Step Calculation of a Sample Mean

Ball bearing diameters: 12.1, 11.8, 12.3, 12.0, 11.9, 12.5.

1a — Sum:

1b — Mean:

Problem 2 — Determining the Sorted Middle Value

The key for all median problems is to sort the data first.

Variant 0 — Quiz scores (n=9, odd): Sort to get 8, 9, 11, 12, 13, 14, 15, 17, 18. The median is at position . The 5th value is 13.
Variant 1 — Daily temperatures (n=8, even): Sort to get 16, 18, 19, 21, 22, 23, 25, 27. Average positions 4 and 5: 21.5.
Variant 2 — Café customers (n=7, odd): Sort to get 28, 30, 34, 37, 41, 44, 55. The median is at position 4. The 4th value is 37.
Variant 3 — Reaction times (n=6, even): Sort to get 0.38, 0.43, 0.45, 0.50, 0.51, 0.67. Average positions 3 and 4: 0.475.
Variant 4 — Repair times (n=5, odd): Sort to get 1.0, 2.0, 3.0, 3.5, 4.5. The median is at position 3. The 3rd value is 3.0.

Problem 3 — Matching Central Tendency to Distribution Shapes

Apply the decision rule based on shape and variable type:

Variant 0 — Right-skewed household incomes: Median. The mean is inflated by the high earners in the right tail.
Variant 1 — Symmetric petal lengths: Mean. For symmetric quantitative data with no outliers, the mean uses all the data and is preferred.
Variant 2 — Preferred study location: Mode. This is qualitative nominal data. You cannot compute a mean or median for categories like “Library” or “Café”.
Variant 3 — Left-skewed marathon times: Median. The outlier at 2.1 hours pulls the mean downward, misrepresenting typical runners.
Variant 4 — Symmetric absences: Mean. Even though the data are integers, the distribution is symmetric with no outliers, so the mean is best.

Section 6: Independent Practice Solutions

▾

Problem 1 — Diagnosing Skewness via Central-Measure Discrepancies

Mean = $485,000; Median = $342,000.

When the mean is significantly greater than the median, the distribution is right-skewed. A small number of very expensive homes in the right tail pull the arithmetic average up, but they do not affect the position of the middle value. The median is a much better representation of a typical home sale in this area.

Problem 2 — Practicing Median Computations Across Sample Sizes

Variant 0 — Commute times (n=7, odd): Sort to get 19, 22, 23, 27, 31, 35, 45. The 4th value is 27.
Variant 1 — App counts (n=10, even): Sort to get 35, 39, 44, 48, 50, 55, 58, 62, 67, 71. Average positions 5 and 6: 52.5.
Variant 2 — Egg mass (n=6, even): Sort to get 54.8, 57.2, 59.1, 60.5, 61.0, 63.4. Average positions 3 and 4: 59.8.
Variant 3 — Pulse rates (n=5, odd): Sort to get 67, 72, 80, 88, 95. The 3rd value is 80.
Variant 4 — Delivery times (n=8, even): Sort to get 2, 3, 3, 4, 5, 6, 7, 8. Average positions 4 and 5: 4.5.

Problem 3 — Defending Measure Choices for Skewed Environments

Variant 0 — Knee surgery stays (right-skewed): Median. Extended stays from complications inflate the mean.
Variant 1 — Light bulb lifetimes (symmetric): Mean. Symmetric distribution without outliers.
Variant 2 — Streaming genres (nominal): Mode. You cannot average names of genres.
Variant 3 — Mercury levels (right-skewed): Median. High readings near industrial sites are extreme outliers.
Variant 4 — Gym workout frequency (symmetric): Mean. Symmetric integer counts are best summarized by the mean.

Problem 4 — Algorithmic Problem Generators (Mean, Median, Mode)

Mean: The method is always the same. Add up all the generated numbers to get the sum (), then divide by the total number of values ().

Median: You must sort the values first. If the generator gives you an odd number of values, take the exact middle number. If it gives you an even number of values, take the two middle numbers, add them, and divide by 2.

Mode: Look through the values and tally how often each appears. The value (or values) with the highest frequency is the mode.

Section 7: Mastery Check Solutions

▾

Problem 1 — Feynman Test

A strong explanation must cover:

The mean is the “balance point” where the total is shared equally.
The median is the middle value in a sorted lineup.
The mode is simply the most frequent value.
When to use which: The mean is pulled by extreme outliers (skewed data), making it misleading for things like incomes or house prices. In those cases, the median is better. The mean is ideal for symmetric data, and the mode is the only option for categorical data.

Problem 2 — Evaluating Mild Right Skew in Household Vehicle Data

Frequency table: 0 cars (18), 1 car (74), 2 cars (82), 3 cars (21), 4 cars (5).

Step 1 (Shape): The distribution peaks at 1–2 cars and tails off towards 3–4 cars. It is approximately symmetric or slightly right-skewed.

Step 2 (Compute mean): Use the weighted mean formula for a frequency distribution.

Section 8: Boss Fight Solutions

▾

Path A — The Analyst: Tech Startup Salary Analysis

Task A1 (Mean): Sum = 52 + 54 + 55 + 72 + 75 + 98 + 102 + 130 + 195 + 450 = 1,283 (thousands).

Task A2 (Median): The data is sorted. For , the middle positions are 5 and 6 (75k and 98k).

Task A3 (Interpretation): The median better represents the typical employee. The CEO’s $450,000 salary is an extreme outlier that inflates the mean, pulling it $41,800 above the median.

Task A4 (Reflection): Advertising the mean ($128,300) would be misleading. Eight out of ten employees earn less than that figure. A prospective employee should look at the median ($86,500) for a realistic expectation.

Path B — The Architect: Commute Time Study Design

Task B1 (Variable type): Commute time is a quantitative continuous variable. It is a numerical measurement that can take any value (e.g., 14.5 minutes).

Task B2 (Measure): You should report the median. Since commute times are right-skewed (a few very long commutes pull the right tail out), the median is the robust choice.

Task B3 (Sample size): High variability means a small sample median will be an unreliable estimate. With commutes ranging from 10 minutes to 2 hours, a sample of 50 might randomly get too many long commuters, heavily skewing the results. 500 residents provides a much more stable estimate of the true population median.

Task B4 (Reflection): The mean is not robust when dealing with right-skewed distributions. A handful of 90+ minute commutes will artificially inflate the mean, suggesting the typical resident travels much longer than they actually do. The median gives the most accurate picture of the typical commuter’s journey, which is exactly what transit planners need.

Section 9: Challenge Problem Solutions

▾

Challenge 1 — Balancing Robustness and Data Retention with a Trimmed Mean

Dataset: 42, 55, 61, 68, 72, 74, 78, 81, 95.

Question 1: A 10% trim removes the lowest 10% and highest 10%. , which rounds to 1 value on each side. We drop 42 and 95. The remaining 7 values sum to 489.

Question 2 (Variants):

Variant 0: As the trim percentage approaches 50%, you strip away more and more of the outer values until only the middle remains. The trimmed mean approaches the median.
Variant 1: For a single massive outlier, a 25% trimmed mean (also called the interquartile mean) is robust.
Variant 2: Dropping the highest and lowest judge scores is an example of the trimmed mean.

Challenge 2 — Proving the Least-Squares Minimizer Property

Verify with numbers: For the dataset , the mean is 4.

Using the mean ():

Using an arbitrary other number ():

Since , the mean minimizes the sum of squared errors. Calculus confirms that is the global minimum for this quadratic function.

Challenge 3 — Anticipating the Need for Measures of Spread

For any dataset, the deviations from the mean () always sum to zero. This proves the mean is the exact algebraic center of mass. In DS-5, you will learn to square these deviations to measure the total spread (variance and standard deviation) without them cancelling each other out.

← Return to Lesson DS-3