EN FR

DS-3 Solutions: Central Tendency Measures

Solutions Reference · ← Back to Lesson DS-3

Section 5 — Guided Practice Solutions

Guided Practice 1: Step-by-Step Calculation of a Sample Mean

Ball bearing diameters: 12.1, 11.8, 12.3, 12.0, 11.9, 12.5.

Question 1a — Sum:
\[ \sum x_i = 12.1 + 11.8 + 12.3 + 12.0 + 11.9 + 12.5 = 72.6 \]

Question 1b — Mean:
\[ \bar{x} = \frac{\sum x_i}{n} = \frac{72.6}{6} = 12.10 \]


Guided Practice 2: Determining the Sorted Middle Value

The key for all median problems is to sort the data first.


Guided Practice 3: Matching Central Tendency to Distribution Shapes

Apply the decision rule based on shape and variable type:

Section 6 — Independent Practice Solutions

Independent Practice 1: Diagnosing Skewness via Central Measure Discrepancies

Mean = $485,000; Median = $342,000.

When the mean is significantly greater than the median, the distribution is right-skewed. A small number of very expensive homes in the right tail pull the arithmetic average up, but they do not affect the position of the middle value. The median is a much better representation of a typical home sale in this area.


Independent Practice 2: Practicing Median Computations Across Sample Sizes


Independent Practice 3: Defending Measure Choices for Skewed Environments


Independent Practice 4, 5 & 6: Algorithmic Problem Generators

Generator: Mean: The method is always the same. Add up all the generated numbers to get the sum (\\( \sum x_i \\)), then divide by the total number of values (\\(n\\)).

Generator: Median: You must sort the values first. If the generator gives you an odd number of values, take the exact middle number. If it gives you an even number of values, take the two middle numbers, add them, and divide by 2.

Generator: Mode: Look through the values and tally how often each appears. The value (or values) with the highest frequency is the mode.

Section 7 — Mastery Check Solutions

Mastery Check 1: Feynman Test

A strong explanation must cover:


Mastery Check 2: Evaluating Mild Right Skew in Household Vehicle Data

Frequency table: 0 cars (18), 1 car (74), 2 cars (82), 3 cars (21), 4 cars (5).

Step 1 (Shape): The distribution peaks at 1–2 cars and tails off towards 3–4 cars. It is approximately symmetric or slightly right-skewed.

Step 2 (Compute mean): Use the weighted mean formula for a frequency distribution. \[ \sum f_i x_i = (18\times0) + (74\times1) + (82\times2) + (21\times3) + (5\times4) \] \[ \sum f_i x_i = 0 + 74 + 164 + 63 + 20 = 321 \] \[ \bar{x} = \frac{321}{200} = 1.605 \text{ cars} \]

Section 8 — Boss Fight Solutions

Path A: The Analyst — Tech Startup Salary Analysis

Task A1 (Mean):
Sum = 52 + 54 + 55 + 72 + 75 + 98 + 102 + 130 + 195 + 450 = 1,283 (thousands).
\[ \bar{x} = \frac{1{,}283}{10} = \\$128{,}300 \]

Task A2 (Median):
The data is sorted. For \(n=10\), the middle positions are 5 and 6 (75k and 98k).
\[ \text{Median} = \frac{75{,}000 + 98{,}000}{2} = \\$86{,}500 \]

Task A3 (Interpretation): The median better represents the typical employee. The CEO's $450,000 salary is an extreme outlier that inflates the mean, pulling it $41,800 above the median.

Task A4 (Reflection): Advertising the mean ($128,300) would be misleading. Eight out of ten employees earn less than that figure. A prospective employee should look at the median ($86,500) for a realistic expectation.


Path B: The Architect — Commute Time Study Design

Task B1 (Variable type): Commute time is a quantitative continuous variable. It is a numerical measurement that can take any value (e.g., 14.5 minutes).

Task B2 (Measure): You should report the median. Since commute times are right-skewed (a few very long commutes pull the right tail out), the median is the robust choice.

Task B3 (Sample size): High variability means a small sample median will be an unreliable estimate. With commutes ranging from 10 minutes to 2 hours, a sample of 50 might randomly get too many long commuters, heavily skewing the results. 500 residents provides a much more stable estimate of the true population median.

Task B4 (Reflection): The mean is not robust when dealing with right-skewed distributions. A handful of 90+ minute commutes will artificially inflate the mean, suggesting the typical resident travels much longer than they actually do. The median gives the most accurate picture of the typical commuter's journey, which is exactly what transit planners need.

Section 9 — Challenge Problem Solutions

Challenge Problem 1: Balancing Robustness and Data Retention with a Trimmed Mean

Dataset: 42, 55, 61, 68, 72, 74, 78, 81, 95

Question 1: A 10% trim removes the lowest 10% and highest 10%. \(10\\% \text{ of } 9 = 0.9\), which rounds to 1 value on each side. We drop 42 and 95. The remaining 7 values sum to 489.
\[ \bar{x}_{10\\%} = \frac{489}{7} \approx 69.9 \]

Question 2 (Variants):


Challenge Problem 2: Proving the Least-Squares Minimizer Property

Verify with numbers: For data {2, 4, 6}, the mean is 4.

Using the mean (\(c=4\)):
\[ \text{SSE}(4) = (2-4)^2 + (4-4)^2 + (6-4)^2 = 4 + 0 + 4 = 8 \]

Using an arbitrary other number (\(c=5\)):
\[ \text{SSE}(5) = (2-5)^2 + (4-5)^2 + (6-5)^2 = 9 + 1 + 1 = 11 \]

Since \(8 < 11\), the mean minimizes the sum of squared errors. Calculus confirms that \(c = \bar{x}\) is the global minimum for this quadratic function.


Challenge Problem 3: Anticipating the Need for Measures of Spread

For any dataset, the deviations from the mean (\(x_i - \bar{x}\)) always sum to zero. This proves the mean is the exact algebraic center of mass. In DS-5, you will learn to square these deviations to measure the total spread (variance and standard deviation) without them cancelling each other out.