Solutions — Position and Distribution Shape

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson DS-5

Section 5: Guided Practice Solutions

▾

Problem 1 — Computing Z-Scores (Variant Bank)

Context for all variants: calls, calls. Formula: .

Variant A (147 calls): . This agent handled 1.80 SD more calls than the mean — a notably high-volume day.

Variant B (105 calls): . Exactly 1 SD fewer than the mean — below average but not unusual.

Variant C (120 calls): . Exactly at the mean.

Variant D (93 calls): . 1.80 SD fewer than the mean — symmetrically opposite to Variant A.

Variant E (158 calls): . 2.53 SD above the mean — an unusually high-volume day worth flagging.

Common mistakes: (1) forgetting to subtract the mean before dividing — dividing the raw score by gives a meaningless ratio; (2) dropping the sign — a negative z-score means below the mean, not an error; (3) using for a population problem or for a sample problem.

Problem 2 — Interpreting Percentiles (MCQ)

Correct answer: “A student who scored 68 performed better than approximately 50% of all test-takers.”

Why the other options are wrong:

Option B (“in the top 68%”): confuses the raw score (68 points) with the percentile rank. This student is at , so 50% scored below — not 68%.
Option C (“a student at P75 scored 75 out of 100”): means the value at the 75th percentile is 79 points. The subscript in is the percentage of data below, never the value itself.
Option D (“the median is 79”): is the median. The median is the 50th percentile, not the 75th.

Problem 3 — Empirical Rule Application (Variant Bank)

Context: body temperatures ≈ normal with °C, °C.

Variant A — Between 36.2°C and 37.8°C: and . By the Empirical Rule, ≈ 95% of healthy adults fall in this range.

Variant B — Between 36.6°C and 37.4°C: and . By the Empirical Rule, ≈ 68%.

Variant C — 38.2°C, how many SDs above the mean? . Exactly 3 SD above — only ≈ 0.15% of adults are this high, consistent with a fever.

Variant D — Percentage above 37.4°C: . ≈ 68% within 1 SD, so ≈ 32% outside; by symmetry, ≈ are above .

Variant E — Interval containing ≈ 99.7%: °C and °C. So 35.8°C to 38.2°C.

Common mistakes: (1) applying the Empirical Rule to a non-normal distribution; (2) taking 68% as a one-sided area — “within 1 SD” is 68% total; (3) for one-sided tails, always halve the outside percentage by symmetry (32%/2 = 16%, not 32%).

Problem 4 — Classifying Distribution Shape (MCQ)

Correct answer: “Right-skewed; the median is a more appropriate measure of centre.”

Reasoning: Mean ($112,000) > Median ($84,000). When the mean is pulled above the median, high-value outliers drag it to the right — the distribution is right-skewed. For right-skewed data, the median is resistant to extreme values and more representative of the typical salary.

Why the other options are wrong:

Left-skewed / mean appropriate: in left-skewed data, mean < median. Here mean > median, so both the direction and the recommendation are wrong.
Right-skewed / mean appropriate: direction is right, but for right-skewed data with extreme highs the mean overstates the typical salary — the median is better.
Symmetric: if symmetric, mean ≈ median. A $28,000 gap is too large to be rounding — this is clear right skew from very high executive salaries.

Section 6: Independent Practice Solutions

▾

Problem 1 — Z-Score Generator

The numbers vary each time, but the method is always the same:

Identify whether the context is a population (use , ) or a sample (use , ).
Apply or .
Round to 2 decimal places.
Interpret: direction (positive = above mean, negative = below) and magnitude (how many SD away).

Example (representative values): , , . . The value 98 is 1.50 SD above the mean — above average but within 2 SD, so not unusual.

Common mistakes: (1) subtracting in the wrong order — always ; (2) dividing by (the variance) instead of ; (3) omitting the interpretation of sign.

Problem 2 — Percentile Generator

The dataset and target percentile change each time, but the nearest-rank method is constant:

Confirm the data is sorted ascending.
Compute the rank: , where is the percentile and the count. means “round up.”
The -th value in the sorted list is .
Interpret: “approximately % of the data falls below [value].”

Example (n = 10): Sorted 14, 19, 23, 27, 31, 35, 40, 46, 52, 60. Find . . The 8th value is 46, so .

Common mistakes: (1) forgetting to sort first; (2) using (floor) instead of (ceiling); (3) confusing the rank (position 8) with the value (46).

Problem 3 — Empirical Rule (Variant Bank)

Variant A — IQ scores (, ), between 70 and 130: , . ≈ 95% of the population.

Variant B — Percentage below 85: . ≈ 32% outside 1 SD; by symmetry ≈ 16% below .

Variant C — House prices (mean $480,000, median $310,000, right tail): should the Empirical Rule be used? No. The large mean–median gap plus the right tail indicates heavy right skew; the rule requires approximate normality, so it would give badly wrong estimates.

Variant D — Interval containing 99.7%: → 55 to 145.

Variant E — Reaction times ( ms, ms), is 410 ms unusual? . Four SD above the mean — extremely unusual (>99.7% fall within 3 SD); likely a recording error or genuine outlier to investigate.

Problem 4 — Find the Error

The specific error: the student correctly stated that ≈ 68% of students scored between 62 and 82 (the interval), then used “68%” as if it were the z-score for a score of 80. That is a category confusion — a proportion and a z-score are completely different quantities.

The correct z-score for 80 when , : . The “68%” describes a band of observations; it has no connection to any individual value’s z-score.

General method for any generated variant:

Read the student’s classification (right / left / symmetric).
Check mean vs. median: mean > median → right-skewed; mean < median → left-skewed; mean ≈ median → symmetric.
Check the tail direction in the histogram — the tail names the skew.
If the answer contradicts either test, name the specific error (wrong direction, wrong tail, or inverted mean/median relationship).

Problem 5 — Multi-Step Synthesis

Bolt diameters: ≈ normal, mm, mm. Specification: 9.90 mm to 10.10 mm.

(a) Z-scores for the specification limits:

(b) The limits are exactly , so by the Empirical Rule ≈ 95% of bolts are accepted.

(c) A bolt measuring 10.12 mm: . Since , it exceeds the upper limit — rejected.

(d) With the process mean shifted to mm (same ): and . The acceptance band is no longer symmetric ( from to ); more bolts now exceed the upper limit. The mean shift lowers the accepted proportion — process adjustment is warranted.

Key insight: the Empirical Rule gives clean answers only when the limits are exactly for integer . After a mean shift, the limits are no longer symmetric z-scores and exact proportions need a standard normal table (later lessons).

Section 7: Mastery Check Solutions

▾

Problem 1 — Feynman Explanation

“Below average” tells you only the direction — the value is on the low side of the mean — with no information about how far below. A value 0.01 SD below the mean and one 3 SD below are both “below average” but have completely different implications.

A z-score of communicates both direction (negative → below the mean) and magnitude (1.5 SD away), placing the value in context relative to the full spread. In an approximately normal distribution, ≈ 93% of values fall within 1.5 SD of the mean, so is below average but not extreme. The z-score is also unitless, enabling comparisons across datasets in different units — something “below average” cannot do.

Problem 2 — Apply Question

Correct answer: the analyst should not apply the Empirical Rule — the large mean–median gap signals a right-skewed distribution, violating the normality condition.

Reasoning: the mean ($94,000) is well above the median ($68,000) — a $26,000 gap — signaling right skew driven by a few highly-paid executives. The Empirical Rule requires an approximately normal shape; applying it here would mislead.

Instead: report the five-number summary and IQR, use the median as the centre, and avoid Empirical-Rule claims. A histogram or box plot would show the skewed shape.

Why the other options are wrong:

“Apply and conclude 68%”: the rule requires normality; right skew makes the 68% estimate unreliable.
“Use median as centre, apply Empirical Rule”: swapping the centre measure doesn’t fix it — the whole distribution must be ≈ normal.
“Applies if sample size is large enough”: size doesn’t fix skew. The CLT applies to sample means, not to individual salaries; a large right-skewed dataset is still right-skewed.

Problem 3 — Error Analysis

Correct answer: the student divided the raw score by the mean instead of subtracting the mean before dividing by the SD.

Student’s computation: — dividing the score by the mean, which is not the z-score formula.

Correct computation:

The student’s conclusion (“above average”) happens to be directionally right, but the value and the formula are wrong.

Why the other options are wrong:

“Requires population parameters”: both and versions are valid — the error is in how the formula was applied.
“z = 1.21 means below average”: any positive z-score is above the mean — the direction is fine, the error is computational.
“Should have squared the SD”: the formula uses (or ), not — squaring gives the variance, not used here.

Section 8: Boss Fight Solutions

▾

Path A — Exam Analyst

Given: , . Students: 89, 45, 71.

Task 1 — Z-scores: Sanity check: Student 3 (71) is just above the mean → small positive ; Student 1 well above, Student 2 well below. Signs and magnitudes are consistent.

Task 2 — Empirical Rule and the 60 threshold: . This isn’t a whole-number SD boundary, so the Empirical Rule (clean only at ±1, ±2, ±3 SD) can only bound it: more than 16% but less than 50% of students scored below 60. A precise estimate needs a standard normal table.

Task 3 — Ranking by unusualness (by ):

Student 2: — most unusual (>2 SD below the mean)
Student 1: — second (nearly 2 SD above)
Student 3: — least (near the mean)

Beyond , only ≈ 5% of scores occur; Students 1 and 2 are both approaching that tail.

Task 4 — Memo (model content): the three z-scores are ≈ , , . Student 3 (71) is essentially average and needs no special attention. Student 1 (89) is strong — top few percent. Student 2 (45) is in the bottom 2–3% — nearly 2.5 SD below the mean — and may warrant academic support. Raw scores alone would miss that Student 2 is as unusually low as Student 1 is high, in opposite directions.

Path B — Process Architect

Task 1 — Empirical Rule applicability:

Line A: described as “approximately bell-shaped and symmetric,” with mean and median implicitly close → the rule applies.
Line B: does not apply — sharp peak at 10 mm with a long right tail, and mean ( mm) substantially above median (10.5 mm) → strong right skew. Applying the rule would underestimate thick panels in the right tail and overstate concentration near the mean.

Task 2 — Z-scores for Line A limits ( mm, mm; limits 7.1–8.9 mm): The limits are exactly , so ≈ 99.7% of Line A tiles pass.

Task 3 — Flag outliers on Line A: a tile measuring 8.75 mm gives . It is inside spec () but statistically unusual (, in the outer 5%). It passes QC but should be monitored.

Task 4 — Recommendation (model content): Line A is ≈ normal and the Empirical Rule applies reliably (≈ 99.7% within spec). Line B is right-skewed (mean above median, long right tail), so the rule cannot be applied — normality-based reports would underestimate panels above the upper limit. For Line B: (1) collect a larger sample and plot a histogram; (2) compute the exact out-of-spec proportion from the sample directly; (3) investigate the right tail’s root cause (batch, shift, material lot). Until then, use the sample proportion within limits rather than a rule-based approximation.

Section 9: Challenge Problem Solutions

▾

Challenge 1 — Why Does Right-Skew Imply Mean > Median? (Variant Bank)

Variant A — Concrete example: Dataset . Mean ; median = 3 (the middle value). Mean (22) > Median (3) — confirmed. The value 100 contributes its full size to the sum (inflating the mean) but only one rank to the median, so it can displace the median by at most one position.

Variant B — Algebraic argument: For odd , the median is and For we need . As , the left side grows without bound while the right side (a fixed middle-rank value) stays finite. So for large enough , the mean exceeds the median.

Variant C — Balance-point intuition: the mean is the balance point (place each value as a weight on the number line; the mean is where the see-saw balances), while the median splits the count into two equal halves. In a right-skewed distribution, a few large values far to the right create a large rightward torque, so the fulcrum (mean) shifts right past the median. The median doesn’t move, because it counts observations by rank, not magnitude. Result: mean > median.

Challenge 2 — Z-Scores Preserve Relative Order

(a) Proof that when : for , Since , ; since , the fraction is positive. Therefore , i.e. .

(b) Why order preservation matters: if standardizing reversed any pair’s order, a higher z-score wouldn’t reliably mean a higher relative position — z-scores would be useless for ranking or cross-dataset comparison, defeating the purpose of standardization.

(c) What happens when : every value equals , so divides by zero — undefined. A dataset with zero spread has no variation and “relative position” is meaningless. The preservation property requires .

← Return to Lesson DS-5

DS-5: Solutions — Position and Distribution Shape

Section 5: Guided Practice Solutions

Problem 1 — Computing Z-Scores (Variant Bank)

Problem 2 — Interpreting Percentiles (MCQ)

Problem 3 — Empirical Rule Application (Variant Bank)

Problem 4 — Classifying Distribution Shape (MCQ)

Section 6: Independent Practice Solutions

Problem 1 — Z-Score Generator

Problem 2 — Percentile Generator

Problem 3 — Empirical Rule (Variant Bank)

Problem 4 — Find the Error

Problem 5 — Multi-Step Synthesis

Section 7: Mastery Check Solutions

Problem 1 — Feynman Explanation

Problem 2 — Apply Question

Problem 3 — Error Analysis

Section 8: Boss Fight Solutions

Path A — Exam Analyst

Path B — Process Architect

Section 9: Challenge Problem Solutions

Challenge 1 — Why Does Right-Skew Imply Mean > Median? (Variant Bank)

Challenge 2 — Z-Scores Preserve Relative Order