GP-1 — Computing the Range (Ecologist Bird Species)
Dataset: 12, 8, 15, 10, 9, 14, 7, 11 (8 wetland sites).
(a) Find the minimum:
Scan all values: the smallest is 7.
min = 7 species.
(b) Compute the range:
max = 15 (the largest value), min = 7.
\[ \text{Range} = \max - \min = 15 - 7 = \mathbf{8} \text{ species} \]
The bird species counts span 8 species across the 8 sites.
(c) With a 9th site (37 species), new range:
The new maximum is 37. The minimum remains 7 (no new low value was added).
\[ \text{New Range} = 37 - 7 = \mathbf{30} \text{ species} \]
The range jumped from 8 to 30 — nearly a 4× increase from a single new observation. This is the range's fragility in action: one extreme value dominates it completely.
Why this matters: Before the 9th site, the range of 8 species was a reasonable description of spread (all values between 7 and 15). After adding site 9, the range of 30 is misleading — 8 of 9 sites are still tightly clustered between 7 and 15. The range now overstates the typical spread by nearly 4×. This is exactly why we need resistant measures like the IQR.
GP-2 — Computing Sample Variance and Standard Deviation (All 5 Variants)
Every variant follows the same procedure. The formula is:
\[ s^2 = \frac{\sum(x_i - \bar{x})^2}{n-1}, \qquad s = \sqrt{s^2} \]
We use \( n-1 \) (Bessel's correction) because we are estimating the population variance from a sample.
Variant 0 — Tomato Plant Fruit Counts (n = 5)
Data: 8, 12, 9, 11, 10
Step 1 — Mean: \( \bar{x} = \frac{8+12+9+11+10}{5} = \frac{50}{5} = 10 \)
Step 2 — Deviation table:
| \( x_i \) | \( x_i - \bar{x} \) | \( (x_i - \bar{x})^2 \) |
|---|---|---|
| 8 | −2 | 4 |
| 12 | +2 | 4 |
| 9 | −1 | 1 |
| 11 | +1 | 1 |
| 10 | 0 | 0 |
Check: deviations sum to \( -2+2-1+1+0 = 0 \) ✓
Step 3 — Sum of squared deviations: \( \sum(x_i - \bar{x})^2 = 4+4+1+1+0 = 10 \)
Step 4 — Sample variance: \( s^2 = \frac{10}{5-1} = \frac{10}{4} = \mathbf{2.5} \)
Step 5 — Sample standard deviation: \( s = \sqrt{2.5} \approx \mathbf{1.58} \)
Interpretation: The typical tomato plant's fruit count deviates from the mean of 10 by about 1.58 fruits.
Variant 1 — Runner Sprint Times (n = 6)
Data: 12.1, 11.8, 12.5, 12.0, 11.6, 12.4 (seconds)
Step 1 — Mean: \( \bar{x} = \frac{72.4}{6} \approx 12.0667 \)
Step 2 — Deviation table:
| \( x_i \) | \( x_i - \bar{x} \) | \( (x_i - \bar{x})^2 \) |
|---|---|---|
| 12.1 | +0.0333 | 0.0011 |
| 11.8 | −0.2667 | 0.0711 |
| 12.5 | +0.4333 | 0.1878 |
| 12.0 | −0.0667 | 0.0044 |
| 11.6 | −0.4667 | 0.2178 |
| 12.4 | +0.3333 | 0.1111 |
Step 3 — Sum of squared deviations: \( \sum(x_i - \bar{x})^2 \approx 0.5933 \)
Step 4 — Sample variance: \( s^2 = \frac{0.5933}{6-1} = \frac{0.5933}{5} \approx \mathbf{0.119} \)
Step 5 — Sample standard deviation: \( s = \sqrt{0.119} \approx \mathbf{0.34} \) seconds
Interpretation: The typical runner's time deviates from the mean of 12.07 seconds by about 0.34 seconds. This small SD relative to the mean indicates a tightly clustered field.
Variant 2 — Cafe Daily Pastry Sales (n = 7)
Data: 24, 30, 22, 28, 26, 20, 25
Step 1 — Mean: \( \bar{x} = \frac{175}{7} = 25 \)
Step 2 — Deviation table:
| \( x_i \) | \( x_i - \bar{x} \) | \( (x_i - \bar{x})^2 \) |
|---|---|---|
| 24 | −1 | 1 |
| 30 | +5 | 25 |
| 22 | −3 | 9 |
| 28 | +3 | 9 |
| 26 | +1 | 1 |
| 20 | −5 | 25 |
| 25 | 0 | 0 |
Check: \( -1+5-3+3+1-5+0 = 0 \) ✓
Step 3 — Sum of squared deviations: \( 1+25+9+9+1+25+0 = 70 \)
Step 4 — Sample variance: \( s^2 = \frac{70}{7-1} = \frac{70}{6} \approx \mathbf{11.67} \)
Step 5 — Sample standard deviation: \( s = \sqrt{11.67} \approx \mathbf{3.42} \) pastries
Interpretation: Daily pastry sales typically deviate from the mean of 25 by about 3.4 pastries. The sales are moderately variable — some days differ by 10 or more from the mean (2–3 standard deviations out).
Variant 3 — Container Liquid Volumes (n = 8)
Data: 250, 248, 253, 251, 249, 252, 247, 250 (mL)
Step 1 — Mean: \( \bar{x} = \frac{2000}{8} = 250 \)
Step 2 — Deviation table:
| \( x_i \) | \( x_i - \bar{x} \) | \( (x_i - \bar{x})^2 \) |
|---|---|---|
| 250 | 0 | 0 |
| 248 | −2 | 4 |
| 253 | +3 | 9 |
| 251 | +1 | 1 |
| 249 | −1 | 1 |
| 252 | +2 | 4 |
| 247 | −3 | 9 |
| 250 | 0 | 0 |
Step 3 — Sum of squared deviations: \( 0+4+9+1+1+4+9+0 = 28 \)
Step 4 — Sample variance: \( s^2 = \frac{28}{8-1} = \frac{28}{7} = \mathbf{4.0} \)
Step 5 — Sample standard deviation: \( s = \sqrt{4.0} = \mathbf{2.0} \) mL
Interpretation: The filling process is very consistent — the typical container deviates from the target of 250 mL by only 2.0 mL (less than 1% relative variation).
Variant 4 — Package Weights (n = 4)
Data: 3.2, 3.8, 3.5, 3.1 (kg)
Step 1 — Mean: \( \bar{x} = \frac{13.6}{4} = 3.4 \)
Step 2 — Deviation table:
| \( x_i \) | \( x_i - \bar{x} \) | \( (x_i - \bar{x})^2 \) |
|---|---|---|
| 3.2 | −0.2 | 0.04 |
| 3.8 | +0.4 | 0.16 |
| 3.5 | +0.1 | 0.01 |
| 3.1 | −0.3 | 0.09 |
Step 3 — Sum of squared deviations: \( 0.04+0.16+0.01+0.09 = 0.30 \)
Step 4 — Sample variance: \( s^2 = \frac{0.30}{4-1} = \frac{0.30}{3} = \mathbf{0.10} \)
Step 5 — Sample standard deviation: \( s = \sqrt{0.10} \approx \mathbf{0.32} \) kg
Interpretation: Packages typically deviate from the mean weight of 3.4 kg by about 0.32 kg. Note how the small sample size (n = 4) means Bessel's correction makes a noticeable difference: dividing by n = 4 would give s² = 0.075, but the corrected s² = 0.10 is 33% larger.
Common Mistakes in Variance / SD Computation:
- Forgetting to square deviations: Summing raw deviations gives 0 every time — useless. You must square first, then sum, then divide.
- Dividing by n instead of n−1: This is the single most frequent error. For sample data, the denominator is always n−1 (Bessel's correction). Dividing by n underestimates the true population variance.
- Reporting s² as s: Variance and standard deviation are different quantities. SD = √(variance). A variance of 2.5 means SD ≈ 1.58 — they are not interchangeable numbers.
- Using n−1 for the mean: Bessel's correction applies only to the variance denominator. The mean always divides by n: \( \bar{x} = \sum x_i / n \).
GP-3 — Five-Number Summary, IQR, and Outlier Detection (All 5 Variants)
Variant 0 — Statistics Quiz Scores (n = 10)
Data: 14, 8, 17, 11, 15, 9, 13, 18, 12, 16
Sorted: 8, 9, 11, 12, 13, 14, 15, 16, 17, 18
n = 10 (even).
Q2 (median): positions 5 and 6 → \( \frac{13+14}{2} = 13.5 \)
Lower half (positions 1–5): 8, 9, 11, 12, 13. nL = 5 (odd).
Q1 = position 3 = 11.
Upper half (positions 6–10): 14, 15, 16, 17, 18. nU = 5 (odd).
Q3 = position 3 = 16.
Five-number summary: min = 8, Q1 = 11, Q2 = 13.5, Q3 = 16, max = 18.
IQR: 16 − 11 = 5.
Fences:
Lower = 11 − 1.5 × 5 = 11 − 7.5 = 3.5
Upper = 16 + 1.5 × 5 = 16 + 7.5 = 23.5
All values are in [3.5, 23.5]. No outliers.
Variant 1 — Delivery Truck Distances (n = 7)
Data: 45, 72, 38, 61, 83, 49, 67
Sorted: 38, 45, 49, 61, 67, 72, 83
n = 7 (odd).
Q2 (median): position (7+1)/2 = 4 → 61.
Lower half (below Q2): 38, 45, 49. nL = 3 (odd).
Q1 = 45.
Upper half (above Q2): 67, 72, 83. nU = 3 (odd).
Q3 = 72.
Five-number summary: min = 38, Q1 = 45, Q2 = 61, Q3 = 72, max = 83.
IQR: 72 − 45 = 27 km.
Fences:
Lower = 45 − 1.5 × 27 = 45 − 40.5 = 4.5
Upper = 72 + 1.5 × 27 = 72 + 40.5 = 112.5
All values are in [4.5, 112.5]. No outliers.
Variant 2 — Pharmacy Prescription Counts (n = 7)
Data: 142, 98, 175, 115, 160, 88, 205
Sorted: 88, 98, 115, 142, 160, 175, 205
n = 7 (odd).
Q2 (median): position 4 → 142.
Lower half: 88, 98, 115. Q1 = 98.
Upper half: 160, 175, 205. Q3 = 175.
Five-number summary: min = 88, Q1 = 98, Q2 = 142, Q3 = 175, max = 205.
IQR: 175 − 98 = 77.
Fences:
Lower = 98 − 1.5 × 77 = 98 − 115.5 = −17.5
Upper = 175 + 1.5 × 77 = 175 + 115.5 = 290.5
All values are in [−17.5, 290.5]. The negative lower fence simply means no values can be low enough to be flagged on the low end — which is expected for count data (cannot be negative). No outliers.
Variant 3 — Car Battery Lifetimes (n = 8)
Data: 36, 48, 30, 42, 54, 24, 60, 38
Sorted: 24, 30, 36, 38, 42, 48, 54, 60
n = 8 (even).
Q2: positions 4 and 5 → \( \frac{38+42}{2} = 40 \)
Lower half (positions 1–4): 24, 30, 36, 38. nL = 4 (even).
Q1 = \( \frac{30+36}{2} = 33 \)
Upper half (positions 5–8): 42, 48, 54, 60. nU = 4 (even).
Q3 = \( \frac{48+54}{2} = 51 \)
Five-number summary: min = 24, Q1 = 33, Q2 = 40, Q3 = 51, max = 60.
IQR: 51 − 33 = 18 months.
Fences:
Lower = 33 − 1.5 × 18 = 33 − 27 = 6
Upper = 51 + 1.5 × 18 = 51 + 27 = 78
All values are in [6, 78]. No outliers.
Variant 4 — Apartment Rents (n = 11)
Data: 950, 1100, 850, 1250, 900, 1050, 1150, 1000, 1300, 800, 2000
Sorted: 800, 850, 900, 950, 1000, 1050, 1100, 1150, 1250, 1300, 2000
n = 11 (odd).
Q2: position 6 → 1050.
Lower half (below Q2): 800, 850, 900, 950, 1000. nL = 5 (odd).
Q1 = 900.
Upper half (above Q2): 1100, 1150, 1250, 1300, 2000. nU = 5 (odd).
Q3 = 1250.
Five-number summary: min = 800, Q1 = 900, Q2 = 1050, Q3 = 1250, max = 2000.
IQR: 1250 − 900 = 350.
Fences:
Lower = 900 − 1.5 × 350 = 900 − 525 = 375
Upper = 1250 + 1.5 × 350 = 1250 + 525 = 1775
Outlier check:
800 > 375 → No low outlier.
2000 > 1775 → 2000 is a potential outlier!
All other values (850–1300) are well within the fences. The $2000 apartment should be investigated — is it a luxury penthouse misclassified with standard units, a data-entry error, or a legitimate high-end rental? Do not auto-delete; flag for investigation.