Solutions — Variability and Spread

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson DS-4

Section 5: Guided Practice Solutions

▾

Problem 1 — Computing the Range (Ecologist Bird Species)

Dataset: 12, 8, 15, 10, 9, 14, 7, 11 (8 wetland sites).

(a) Find the minimum: Scan all values — the smallest is 7. min = 7 species.

(b) Compute the range: max = 15 (the largest value), min = 7. The bird species counts span 8 species across the 8 sites.

(c) With a 9th site (37 species), new range: The new maximum is 37; the minimum remains 7 (no new low value). The range jumped from 8 to 30 — nearly a 4× increase from a single new observation. This is the range’s fragility in action: one extreme value dominates it completely.

Why this matters: Before the 9th site, a range of 8 reasonably described the spread (all values between 7 and 15). After site 9, the range of 30 is misleading — 8 of 9 sites are still tightly clustered between 7 and 15. This is exactly why we need resistant measures like the IQR.

Problem 2 — Sample Variance and Standard Deviation (All 5 Variants)

Every variant follows the same procedure: We use (Bessel’s correction) because we are estimating the population variance from a sample.

Variant 0 — Tomato Plant Fruit Counts (n = 5)

Data: 8, 12, 9, 11, 10.

Step 1 — Mean: .

Step 2 — Deviation table:


8	−2	4
12	+2	4
9	−1	1
11	+1	1
10	0	0

Check: deviations sum to ✓

Step 3 — Sum of squared deviations: .

Step 4 — Sample variance: .

Step 5 — Sample standard deviation: .

Interpretation: The typical plant’s fruit count deviates from the mean of 10 by about 1.58 fruits.

Variant 1 — Runner Sprint Times (n = 6)

Data: 12.1, 11.8, 12.5, 12.0, 11.6, 12.4 (seconds).

Step 1 — Mean: .

Step 2 — Deviation table:


12.1	+0.0333	0.0011
11.8	−0.2667	0.0711
12.5	+0.4333	0.1878
12.0	−0.0667	0.0044
11.6	−0.4667	0.2178
12.4	+0.3333	0.1111

Step 3 — Sum of squared deviations: .

Step 4 — Sample variance: .

Step 5 — Sample standard deviation: seconds.

Interpretation: The typical time deviates from the mean of 12.07 s by about 0.34 s — a small SD relative to the mean indicates a tightly clustered field.

Variant 2 — Cafe Daily Pastry Sales (n = 7)

Data: 24, 30, 22, 28, 26, 20, 25.

Step 1 — Mean: .

Step 2 — Deviation table:


24	−1	1
30	+5	25
22	−3	9
28	+3	9
26	+1	1
20	−5	25
25	0	0

Check: ✓

Step 3 — Sum of squared deviations: .

Step 4 — Sample variance: .

Step 5 — Sample standard deviation: pastries.

Interpretation: Daily sales typically deviate from the mean of 25 by about 3.4 pastries — moderately variable.

Variant 3 — Container Liquid Volumes (n = 8)

Data: 250, 248, 253, 251, 249, 252, 247, 250 (mL).

Step 1 — Mean: .

Step 2 — Deviation table:


250	0	0
248	−2	4
253	+3	9
251	+1	1
249	−1	1
252	+2	4
247	−3	9
250	0	0

Step 3 — Sum of squared deviations: .

Step 4 — Sample variance: .

Step 5 — Sample standard deviation: mL.

Interpretation: The filling process is very consistent — the typical container deviates from 250 mL by only 2.0 mL (under 1% relative variation).

Variant 4 — Package Weights (n = 4)

Data: 3.2, 3.8, 3.5, 3.1 (kg).

Step 1 — Mean: .

Step 2 — Deviation table:


3.2	−0.2	0.04
3.8	+0.4	0.16
3.5	+0.1	0.01
3.1	−0.3	0.09

Step 3 — Sum of squared deviations: .

Step 4 — Sample variance: .

Step 5 — Sample standard deviation: kg.

Interpretation: Packages typically deviate from the mean of 3.4 kg by about 0.32 kg. With n = 4, Bessel’s correction matters: dividing by n = 4 would give , but the corrected is 33% larger.

Common mistakes in variance / SD computation:

Forgetting to square deviations: summing raw deviations gives 0 every time. Square first, then sum, then divide.
Dividing by n instead of n−1: the single most frequent error. For sample data the denominator is always n−1.
Reporting as : variance and SD are different quantities. . A variance of 2.5 means .
Using n−1 for the mean: Bessel’s correction applies only to the variance denominator. The mean always divides by n: .

Problem 3 — Five-Number Summary, IQR, and Outlier Detection (All 5 Variants)

Variant 0 — Statistics Quiz Scores (n = 10)

Sorted: 8, 9, 11, 12, 13, 14, 15, 16, 17, 18 (n = 10, even).

Q2 (median): positions 5 and 6 → .
Lower half (positions 1–5): 8, 9, 11, 12, 13. (odd). Q1 = 11 (position 3).
Upper half (positions 6–10): 14, 15, 16, 17, 18. (odd). Q3 = 16 (position 3).

Five-number summary: min = 8, Q1 = 11, Q2 = 13.5, Q3 = 16, max = 18. IQR = 16 − 11 = 5. Fences: Lower = 11 − 1.5 × 5 = 3.5; Upper = 16 + 1.5 × 5 = 23.5. All values in [3.5, 23.5]. No outliers.

Variant 1 — Delivery Truck Distances (n = 7)

Sorted: 38, 45, 49, 61, 67, 72, 83 (n = 7, odd).

Q2: position 4 → 61.
Lower half: 38, 45, 49 → Q1 = 45. Upper half: 67, 72, 83 → Q3 = 72.

Five-number summary: min = 38, Q1 = 45, Q2 = 61, Q3 = 72, max = 83. IQR = 27 km. Fences: Lower = 45 − 1.5 × 27 = 4.5; Upper = 72 + 1.5 × 27 = 112.5. All values in [4.5, 112.5]. No outliers.

Variant 2 — Pharmacy Prescription Counts (n = 7)

Sorted: 88, 98, 115, 142, 160, 175, 205 (n = 7, odd).

Q2: position 4 → 142.
Lower half: 88, 98, 115 → Q1 = 98. Upper half: 160, 175, 205 → Q3 = 175.

Five-number summary: min = 88, Q1 = 98, Q2 = 142, Q3 = 175, max = 205. IQR = 77. Fences: Lower = 98 − 1.5 × 77 = −17.5; Upper = 175 + 1.5 × 77 = 290.5. The negative lower fence just means no value can be flagged low (expected for count data). All values in [−17.5, 290.5]. No outliers.

Variant 3 — Car Battery Lifetimes (n = 8)

Sorted: 24, 30, 36, 38, 42, 48, 54, 60 (n = 8, even).

Q2: positions 4 and 5 → .
Lower half (positions 1–4): 24, 30, 36, 38. . Q1 = .
Upper half (positions 5–8): 42, 48, 54, 60. . Q3 = .

Five-number summary: min = 24, Q1 = 33, Q2 = 40, Q3 = 51, max = 60. IQR = 18 months. Fences: Lower = 33 − 1.5 × 18 = 6; Upper = 51 + 1.5 × 18 = 78. All values in [6, 78]. No outliers.

Variant 4 — Apartment Rents (n = 11)

Sorted: 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1250, 1300, 2000 (n = 11, odd).

Q2: position 6 → 1050.
Lower half (below Q2): 800, 850, 900, 950, 1000. . Q1 = 900.
Upper half (above Q2): 1100, 1150, 1250, 1300, 2000. . Q3 = 1250.

Five-number summary: min = 800, Q1 = 900, Q2 = 1050, Q3 = 1250, max = 2000. IQR = 350. Fences: Lower = 900 − 1.5 × 350 = 375; Upper = 1250 + 1.5 × 350 = 1775. Outlier check: 800 > 375 → no low outlier; 2000 > 1775 → 2000 is a potential outlier. All other values (850–1300) are well within the fences. The $2000 apartment should be investigated — luxury unit, data-entry error, or legitimate high-end rental? Do not auto-delete; flag for investigation.

Section 6: Independent Practice Solutions

▾

Problem 1 — Sample Variance and Standard Deviation (Generator)

Generated fresh each time. The approach is always the same:

Compute the mean: .
Compute each deviation and square it.
Sum the squared deviations: .
Divide by : .
Take the square root: .

Sample output 1 (n = 6): Data = 12, 28, 35, 19, 42, 8. . Deviations: −12, +4, +11, −5, +18, −16. Squared: 144, 16, 121, 25, 324, 256. Sum = 886. , .

Sample output 2 (n = 7): Data = 45, 38, 50, 42, 47, 39, 44. . Deviations: +1.43, −5.57, +6.43, −1.57, +3.43, −4.57, +0.43. Squared: 2.04, 31.04, 41.33, 2.47, 11.76, 20.90, 0.18. Sum ≈ 109.71. , .

Sample output 3 (n = 5): Data = 22, 31, 18, 27, 25. . Deviations: −2.6, +6.4, −6.6, +2.4, +0.4. Squared: 6.76, 40.96, 43.56, 5.76, 0.16. Sum = 97.20. , .

Self-check: always verify that (1) deviations sum to approximately zero, (2) is positive, (3) is smaller than the range (typically range/6 ≤ ≤ range/2 for small samples).

Problem 2 — Five-Number Summary and IQR (Generator)

Generated with . Follow the median-of-halves method (Tukey).

Sample output 1 (n = 9, odd): Sorted 12, 23, 28, 35, 47, 52, 61, 74, 88. Q2 = position 5 = 47. Lower half: 12, 23, 28, 35 → Q1 = (23+28)/2 = 25.5. Upper half: 52, 61, 74, 88 → Q3 = (61+74)/2 = 67.5. Summary: min = 12, Q1 = 25.5, Q2 = 47, Q3 = 67.5, max = 88. IQR = 42.

Sample output 2 (n = 10, even): Sorted 5, 18, 22, 31, 40, 49, 55, 63, 77, 91. Q2 = (40+49)/2 = 44.5. Lower half (1–5): 5, 18, 22, 31, 40 → Q1 = 22. Upper half (6–10): 49, 55, 63, 77, 91 → Q3 = 63. Summary: min = 5, Q1 = 22, Q2 = 44.5, Q3 = 63, max = 91. IQR = 41.

Sample output 3 (n = 11, odd): Sorted 8, 15, 21, 30, 36, 44, 52, 58, 67, 75, 82. Q2 = position 6 = 44. Lower half: 8, 15, 21, 30, 36 → Q1 = 21. Upper half: 52, 58, 67, 75, 82 → Q3 = 67. Summary: min = 8, Q1 = 21, Q2 = 44, Q3 = 67, max = 82. IQR = 46.

Problem 3 — Outlier Detection via Fences (Generator)

Datasets may contain 0, 1, or 2 potential outliers. Always use 1.5 × IQR fences.

Sample 1 — No outliers (n = 12): Sorted 14, 18, 22, 27, 31, 35, 39, 43, 48, 52, 58, 72. Q2 = (35+39)/2 = 37. Lower half (1–6): Q1 = (22+27)/2 = 24.5. Upper half (7–12): Q3 = (48+52)/2 = 50. IQR = 25.5. Lower fence = 24.5 − 1.5 × 25.5 = −13.75; Upper fence = 50 + 1.5 × 25.5 = 88.25. All values in [−13.75, 88.25]. No outliers.

Sample 2 — Single outlier (n = 10): Sorted 11, 15, 19, 24, 28, 33, 37, 42, 48, 112. Q2 = (28+33)/2 = 30.5. Lower half (1–5): Q1 = 19. Upper half (6–10): Q3 = 42. IQR = 23. Lower fence = 19 − 34.5 = −15.5; Upper fence = 42 + 34.5 = 76.5. 112 > 76.5 → 112 is a potential outlier. Could be a data-entry error (e.g., 11.2 typed as 112) or a genuine extreme.

Sample 3 — Larger sample (n = 15): Sorted 8, 12, 16, 20, 23, 27, 31, 35, 40, 44, 49, 55, 62, 70, 145. Q2 = position 8 = 35. Lower half (below Q2): 8, 12, 16, 20, 23, 27, 31 → Q1 = position 4 = 20. Upper half (above Q2): 40, 44, 49, 55, 62, 70, 145 → Q3 = position 4 = 55. IQR = 35. Lower fence = 20 − 52.5 = −32.5; Upper fence = 55 + 52.5 = 107.5. 145 > 107.5 → 145 is a potential outlier; no low outliers.

Remember: flagged values are potential outliers — investigate before deleting. A flagged value could be (a) a data-entry error to fix or remove, (b) a measurement error to document, or (c) a valid extreme observation to retain and note.

Problem 4 — Range vs. Variance: Same Range, Different Spread (Generator)

Two datasets can share the same range yet differ greatly in variance — because the range uses only the two endpoints, while the variance uses every value.

Clustered dataset: min = 20, max = 27, Range = 7. Values bunched near the centre: 22, 23, 24, 24, 25, 25, 26. (tight around ).

Dispersed dataset: min = 20, max = 27, Range = 7 (identical). Values spread across the range: 20, 21, 23, 24, 25, 26, 27. (about 4× larger).

Why range and variance diverge: the range is blind to everything between the endpoints; the variance captures internal clustering or dispersion. Two datasets can have identical bookends but completely different internal structure — the variance sees it, the range does not. (The generator constructs the dispersed set so that ; the specific values vary per generation.)

Problem 5 — Find the Error in Spread-Measure Computations (All 5 Variants)

Variant 0 — Student’s SD Computation (Study Hours)

Data: 8, 6, 10, 4, 12. Student computed , , .

Error: divided by n = 5 instead of n−1 = 4. The squared deviations are correct: . Correct: (not 8), hours (not 2.83). The error understates both — exactly the bias Bessel’s correction fixes.

Variant 1 — Analyst’s Five-Number Summary (Customer Counts)

Sorted: 23, 26, 28, 30, 34, 37, 39, 41, 42, 45.

Error: approximated Q1 and Q3 by raw position (“3rd value,” “8th value”) instead of the median-of-halves method. Correct: n = 10 (even); Q2 = (34+37)/2 = 35.5; lower half 23, 26, 28, 30, 34 → Q1 = 28; upper half 37, 39, 41, 42, 45 → Q3 = 41. IQR = 13. The analyst’s IQR happened to match by coincidence, but the method was wrong and fails on other datasets.

Variant 2 — Lab Technician’s Range-Only Report (Chemical Samples)

Data: 5.12, 5.08, 5.15, 5.11, 5.09, 5.14 g. Technician reported range = 0.07 g and stopped.

Error: relying solely on the range, which uses only min and max and reveals nothing about internal consistency. Should have reported the SD (or IQR) too: , sum of squared deviations ≈ 0.00335, , g. The SD of 0.026 g confirms excellent consistency (CV ≈ 0.5%) — but only computing it proves this.

Variant 3 — Student Forgets to Square Deviations (Exam Scores)

Data: 65, 70, 75, 80, 85, 90, 95. Student: , .

Error: summed the absolute deviations (15+10+5+0+5+10+15 = 60) and divided by n−1 = 6 — that computes the mean absolute deviation, not the SD. Correct: deviations −15, −10, −5, 0, +5, +10, +15; squared 225, 100, 25, 0, 25, 100, 225; sum = 700; ; (not 10). The answer is close here only because the data are evenly spaced.

Variant 4 — HR Analyst’s Range Comparison with an Outlier (Salaries)

Dept A: 42, 48, 45, 52, 47, 44, 50, 46. Dept B: 38, 42, 55, 40, 95, 44, 41, 39. Analyst: Range A = 10, Range B = 57 → “Dept B has much more pay inequity.”

Error: the range is dominated by Dept B’s single $95K outlier. Better — compare IQRs: Dept A sorted 42, 44, 45, 46, 47, 48, 50, 52 → Q1 = 44.5, Q3 = 49, IQR(A) = 4.5. Dept B sorted 38, 39, 40, 41, 42, 44, 55, 95 → Q1 = 39.5, Q3 = 49.5, IQR(B) = 10.0. The IQRs are much closer (about 2.2×, not 5.7×). The dramatic ratio came almost entirely from the $95K outlier, not genuine dispersion — investigate it before drawing equity conclusions.

Problem 6 — Multi-Step Synthesis: Forestry Researcher’s Maple Tree Data

Data: 28, 34, 22, 31, 45, 26, 38, 24, 29 (cm, diameter at breast height). Sorted: 22, 24, 26, 28, 29, 31, 34, 38, 45.

(a) Mean, Variance, and Standard Deviation

Mean: , , cm.


22	−8.78	77.09
24	−6.78	45.97
26	−4.78	22.85
28	−2.78	7.73
29	−1.78	3.17
31	+0.22	0.05
34	+3.22	10.37
38	+7.22	52.13
45	+14.22	202.21

. cm². cm.

(b) Five-Number Summary, IQR, and Outliers

n = 9 (odd). Q2: position 5 → 29 cm. Lower half: 22, 24, 26, 28 → Q1 = (24+26)/2 = 25 cm. Upper half: 31, 34, 38, 45 → Q3 = (34+38)/2 = 36 cm.

Five-number summary: min = 22, Q1 = 25, Q2 = 29, Q3 = 36, max = 45. IQR = 11 cm. Fences: Lower = 25 − 1.5 × 11 = 8.5; Upper = 36 + 1.5 × 11 = 52.5. All 9 values in [8.5, 52.5]. No outliers.

(c) Adding a 10th Tree (68 cm) — Resistance Comparison

New sorted data: 22, 24, 26, 28, 29, 31, 34, 38, 45, 68 (n = 10). , cm.


22	−12.5	156.25
24	−10.5	110.25
26	−8.5	72.25
28	−6.5	42.25
29	−5.5	30.25
31	−3.5	12.25
34	−0.5	0.25
38	+3.5	12.25
45	+10.5	110.25
68	+33.5	1122.25

. , cm. Change in SD: .

New five-number summary (n = 10): Q2 = (29+31)/2 = 30. Lower half (1–5): 22, 24, 26, 28, 29 → Q1 = 26. Upper half (6–10): 31, 34, 38, 45, 68 → Q3 = 38. IQR = 12 cm. Change in IQR: .

Interpretation: One extreme observation (68 cm) nearly doubled the SD (+87.6%) but barely moved the IQR (+9.1%). The SD squares deviations, so the 68 cm value alone contributed 1122.25 — 67% of the new sum of squares. The IQR ignores the tails, so it was largely unaffected. Principle: SD is sensitive to outliers; IQR is resistant — mirroring mean (sensitive) vs. median (resistant) from DS-3. Use mean + SD for symmetric data without outliers; median + IQR for skewed data or data with outliers.

Section 7: Mastery Check Solutions

▾

Problem 1 — Feynman Test: Why n−1? (Model Answer)

When you have the entire population you know the true centre, so dividing by N gives the actual average squared deviation. But with a sample you estimate the centre using the sample mean, which is always pulled slightly toward your sample values. This makes the sample deviations a bit smaller than the true deviations would be. Dividing by a slightly smaller number (n−1 instead of n) inflates the variance just enough to correct for this built-in underestimation.

Think of it as a statistical honesty tax: since your sample is almost certainly less variable than the population it came from, you compensate by making the variance a touch larger. The smaller the sample, the bigger the correction — with 5 values, dividing by 4 really matters; with 500 values, dividing by 499 barely changes anything.

There is also a mathematical reason: once you compute and use it for deviations, the n deviations are not independent — they must sum to zero, so once you know n−1 of them, the last is forced. You have n−1 genuine pieces of information — n−1 degrees of freedom — so you divide by n−1.

Problem 2 — Apply: Choosing the Right Spread Measure (Two Cities)

(a) City A — symmetric, bell-shaped, no outliers → standard deviation. It uses every observation, connects directly to the mean, underpins most inferential statistics (CIs, hypothesis tests), and supports the 68–95–99.7 rule (≈68% of values within ±1 SD).

(b) City B — right-skewed with luxury outliers → IQR. It is resistant (luxury prices sit in the upper tail and don’t affect Q1/Q3), describes the middle 50% — the “typical” market — and avoids the SD’s massive inflation from large squared deviations. For City B the SD might be $300K+ while the IQR is only $50K–$80K; the SD would describe nobody’s actual experience.

(c) Principle: the choice of spread measure mirrors the choice of centre from DS-3. Symmetric, outlier-free → mean + SD (uses all data). Skewed or outliers → median + IQR (resistant; ignores the tails). Always pair the spread measure with the matching centre.

Problem 3 — Error Analysis: Variance vs. Standard Deviation Confusion

Data: 18, 22, 20, 24, 19, 23, 21 (°C). Both students report °C.

The tested error (Student 1): claiming SD = 4.33°C because the variance = 4.33°C² — treating variance and SD as the same number. They differ: . Variance is in squared units (°C²); SD is in original units (°C). This variance/SD confusion is one of the most common and consequential errors in introductory statistics.

Verifying the computation: . Deviations: −3, +1, −1, +3, −2, +2, 0. Squared: 9, 1, 1, 9, 4, 4, 0. Sum = 28. Sample variance (n−1 = 6): °C². SD: °C. (Student 2 is correct in concept — .)

Section 8: Boss Fight Solutions

▾

Path A — The Analyst: Departmental Salary Equity

Engineering (12): 62, 58, 71, 65, 68, 60, 64, 70, 67, 63, 66, 59 ($K). Marketing (10): 48, 52, 55, 50, 95, 53, 47, 51, 54, 49 ($K).

Task A1 — Mean Salary

Engineering: , ($64,420). Marketing: , ($55,400). Engineering’s mean is about $9,000 higher.

Task A2 — Range and Standard Deviation

Engineering: Range = 71 − 58 = 13 ($13,000). Deviations from :


62	−2.42	5.86
58	−6.42	41.22
71	+6.58	43.30
65	+0.58	0.34
68	+3.58	12.82
60	−4.42	19.54
64	−0.42	0.18
70	+5.58	31.14
67	+2.58	6.66
63	−1.42	2.02
66	+1.58	2.50
59	−5.42	29.38

. , ($4,210).

Marketing: Range = 95 − 47 = 48 ($48,000). Deviations from :


48	−7.40	54.76
52	−3.40	11.56
55	−0.40	0.16
50	−5.40	29.16
95	+39.60	1568.16
53	−2.40	5.76
47	−8.40	70.56
51	−4.40	19.36
54	−1.40	1.96
49	−6.40	40.96

. , ($14,150).

Marketing’s SD ($14,150) is about 3.4× Engineering’s ($4,210) — but the single $95K salary contributes 1568.16 of 1802.40 to the sum of squares (87% from one data point). The SD is mostly measuring how far $95K sits from the mean, not typical pay dispersion.

Task A3 — Five-Number Summary and IQR

Engineering (sorted): 58, 59, 60, 62, 63, 64, 65, 66, 67, 68, 70, 71. Q2 = (64+65)/2 = 64.5. Q1 = (60+62)/2 = 61. Q3 = (67+68)/2 = 67.5. IQR(E) = 6.5 ($6,500).

Marketing (sorted): 47, 48, 49, 50, 51, 52, 53, 54, 55, 95. Q2 = (51+52)/2 = 51.5. Q1 = 49. Q3 = 54. IQR(M) = 5 ($5,000).

Key observation: Marketing’s IQR ($5K) is actually smaller than Engineering’s ($6.5K) — the middle 50% of Marketing salaries are more tightly clustered. The range and SD gave the opposite impression because one outlier dominated them.

Task A4 — Outlier Detection

Engineering: fences [61 − 9.75, 67.5 + 9.75] = [51.25, 77.25]. All values in [58, 71]. No outliers. Marketing: fences [49 − 7.5, 54 + 7.5] = [41.5, 61.5]. 95 > 61.5 → $95K is flagged. All other values (47–55) are inside. Do not auto-delete — investigate (data-entry error, misclassification, legacy contract, or a legitimate specialist), then decide.

Task A5 — Synthesize and Advise

Recommendation to the HR Director: Engineering has more equitable pay, but the four measures tell different parts of the story:

Range: Eng $13K vs. Mkt $48K — but Marketing’s range is inflated by the $95K outlier.
SD: Eng $4.2K vs. Mkt $14.2K — but 87% of Marketing’s sum of squares comes from one data point.
IQR: Eng $6.5K vs. Mkt $5.0K — the middle-50% spread is actually tighter in Marketing. This is the most trustworthy comparison (resistant to the outlier).
Outliers: Marketing has one ($95K); Engineering has none.

Actions: (1) investigate the $95K salary; (2) report IQR alongside the median for Marketing; (3) Engineering’s structure is consistent — no red flags; (4) if $95K is valid, note it as a special case and rely on the IQR-based middle-50% comparison for equity.

Path B — The Architect: Quality-Control Study Design

Task B1 — Variable Type

“Tablet mass in mg” is quantitative continuous — mass can take any value within the scale’s precision (e.g., 49.87 mg). Continuous data support a richer toolset (normal distribution, SD, CV) than discrete data.

Task B2 — Primary Spread Measure

Standard deviation. The data are symmetric, bell-shaped, and outlier-free (n = 30, ≈ normal). The SD connects to the normal distribution for action limits (±2s, ±3s), underpins SPC control charts, and supports the 68–95–99.7 rule. IQR isn’t wrong but discards information when the data are normal.

Task B3 — Outlier Detection Thresholds

Required: Q1, Q3, IQR from the batch. Multiplier: 1.5 (Tukey). Fences: [Q1 − 1.5 × IQR, Q3 + 1.5 × IQR]. In QC you can supplement with an SD-based rule (flag beyond ±3s — Western Electric rules); the two methods catch different patterns and can coexist.

Task B4 — Comparative Spread Analysis (50 mg vs. 100 mg Tablets)

(a) Fair-comparison measure: Coefficient of Variation (CV). CV is unitless (SD as a percentage of the mean), so it compares across different target masses. An SD of 2 mg is 4% of a 50 mg target but 2% of a 100 mg target.

(b) Computation:

50 mg line: mg, mg. .
100 mg line: mg, mg. .

Interpretation: the 100 mg line has the lower CV (2.39% vs. 3.59%) — smaller relative variability — even though its absolute SD is larger. Bottom line: the 100 mg process is more consistent. Both CVs are well under 5%, excellent for pharmaceutical content uniformity.

Section 9: Challenge Problem Solutions

▾

Challenge 1 — Proving and Connecting to n−1

Claim: For any numbers with , we have .

Proof: Since is constant across the sum, . Therefore

Connection to n−1 (degrees of freedom): because always holds, the n deviations are not independent — once you know n−1 of them, the last is forced. Example: Data 2, 5, 8, , deviations −3, 0, +3. Knowing the first two (−3, 0) forces the third (+3). Three values, but only 2 independent deviations → n−1 = 2 degrees of freedom.

Minimum sample size for variance: . If , then and the formula divides by zero. You cannot measure spread from a single observation — variability inherently requires at least two values.

Challenge 2 — Comparing Spread Across Different Units with CV

Farm A (baseline): Height cm, cm → . Ear mass g, g → . Ear mass is relatively more variable.

Variant 0 — Farm B: Height ; Mass . Mass remains relatively more variable; both CVs slightly exceed Farm A’s.

Variant 1 — Farm C: Height ; Mass .

Farm	Height CV
Farm A	15.33%
Farm B	16.31%
Farm C	8.98%

Farm C has the most consistent plant height — a CV of 8.98%, nearly half of A’s and B’s. Even though its plants are taller on average, they are far more uniform relative to their mean.

Variant 2 — Farm D (unit-invariance of CV): Height in metres: m, m → . Converting Farm A to metres ( m, m) gives CV = 15.33% — exactly the same as in cm. Why: converting cm → m divides every value (and thus both and ) by 100, leaving the ratio unchanged: CV does not depend on the measurement units — which is what makes it valid for cross-variable comparisons.

Common mistakes to check in your work:

Dividing by n instead of n−1 for sample variance — the single most frequent error in descriptive statistics.
Reporting variance as the SD . Variance is in squared units; SD is in original units. Always take the square root.
Failing to sort data before finding quartiles. Q1, Q2, Q3 are defined on sorted data.
Including Q2 in the halves for odd n. The median belongs to neither half — exclude it.
Using the wrong fence multiplier. The standard is 1.5 (Tukey); 1.0 over-flags, 3.0 catches only far outliers.
Automatically deleting flagged outliers. The fence rule identifies potential outliers for investigation, not deletion.
Comparing SD across datasets with different units without using CV ().

← Return to Lesson DS-4

DS-4: Solutions — Variability and Spread

Section 5: Guided Practice Solutions

Problem 1 — Computing the Range (Ecologist Bird Species)

Problem 2 — Sample Variance and Standard Deviation (All 5 Variants)

Variant 0 — Tomato Plant Fruit Counts (n = 5)

Variant 1 — Runner Sprint Times (n = 6)

Variant 2 — Cafe Daily Pastry Sales (n = 7)

Variant 3 — Container Liquid Volumes (n = 8)

Variant 4 — Package Weights (n = 4)

Problem 3 — Five-Number Summary, IQR, and Outlier Detection (All 5 Variants)

Variant 0 — Statistics Quiz Scores (n = 10)

Variant 1 — Delivery Truck Distances (n = 7)

Variant 2 — Pharmacy Prescription Counts (n = 7)

Variant 3 — Car Battery Lifetimes (n = 8)

Variant 4 — Apartment Rents (n = 11)

Section 6: Independent Practice Solutions

Problem 1 — Sample Variance and Standard Deviation (Generator)

Problem 2 — Five-Number Summary and IQR (Generator)

Problem 3 — Outlier Detection via Fences (Generator)

Problem 4 — Range vs. Variance: Same Range, Different Spread (Generator)

Problem 5 — Find the Error in Spread-Measure Computations (All 5 Variants)

Variant 0 — Student’s SD Computation (Study Hours)

Variant 1 — Analyst’s Five-Number Summary (Customer Counts)

Variant 2 — Lab Technician’s Range-Only Report (Chemical Samples)

Variant 3 — Student Forgets to Square Deviations (Exam Scores)

Variant 4 — HR Analyst’s Range Comparison with an Outlier (Salaries)

Problem 6 — Multi-Step Synthesis: Forestry Researcher’s Maple Tree Data

(a) Mean, Variance, and Standard Deviation

(b) Five-Number Summary, IQR, and Outliers

(c) Adding a 10th Tree (68 cm) — Resistance Comparison

Section 7: Mastery Check Solutions

Problem 1 — Feynman Test: Why n−1? (Model Answer)

Problem 2 — Apply: Choosing the Right Spread Measure (Two Cities)

Problem 3 — Error Analysis: Variance vs. Standard Deviation Confusion

Section 8: Boss Fight Solutions

Path A — The Analyst: Departmental Salary Equity

Task A1 — Mean Salary

Task A2 — Range and Standard Deviation

Task A3 — Five-Number Summary and IQR

Task A4 — Outlier Detection

Task A5 — Synthesize and Advise

Path B — The Architect: Quality-Control Study Design

Task B1 — Variable Type

Task B2 — Primary Spread Measure

Task B3 — Outlier Detection Thresholds

Task B4 — Comparative Spread Analysis (50 mg vs. 100 mg Tablets)

Section 9: Challenge Problem Solutions

Challenge 1 — Proving and Connecting to n−1

Challenge 2 — Comparing Spread Across Different Units with CV