EN FR

INF-1 Solutions: Sampling Distributions and the CLT

Solutions Reference · ← Back to Lesson INF-1

Section 5 — Guided Practice Solutions

GP1 — Identifying the Correct Denominator (Variants A–E)

The key decision in all five variants is the same: use , not , because the question asks about a sample mean.

Variant A (rods, μ = 200, σ = 4, n = 16, target: x̄ > 201):

Variant B (heart rates, μ = 72, σ = 10, n = 25, target: x̄ < 70):

Variant C (crop yield, μ = 45, σ = 6, n = 36, target: x̄ > 46.5):

Variant D (returns, μ = 1.2%, σ = 0.8%, n = 64, target: x̄ < 1.0%):

Variant E (exam scores, μ = 500, σ = 100, n = 100, target: x̄ > 515):

Common error across all variants: Using in the denominator gives a z-score that is times too small. For Variant A, using gives z = 0.25 instead of 1.00 — a probability of 0.4013 instead of 0.1587. The SE accounts for the fact that sample means are less variable than individual observations.


GP2 — Standard Error and Probability (Variants A–E)

Variant A (μ = 120, σ = 18, n = 36, x̄ = 123):

Variant B (μ = 50, σ = 20, n = 100, x̄ = 53):

Variant C (μ = 300, σ = 30, n = 9, normal population, x̄ = 310):

Note: n = 9 < 30, but the population is normal, so the sampling distribution is exactly normal — no CLT approximation needed.

Variant D (μ = 75, σ = 12, n = 144, x̄ = 74):

Variant E (μ = 10, σ = 5, n = 25, normal population, x̄ = 11.5):


GP3 — CLT Conditions

Scenario A (right-skewed income, n = 50): Yes. n = 50 ≥ 30 meets the rule of thumb. The skewed population shape does not prevent the CLT from applying to the sampling distribution of x̄.

Scenario B (normal test scores, n = 15): Yes. When the population is normal, the sampling distribution of x̄ is exactly normal for any n — no minimum sample size required.

Scenario C (bimodal wait times, n = 20): Caution. n = 20 < 30, and the population is extreme (bimodal). The normal approximation may be unreliable. A larger sample (n ≥ 30, ideally more) would be needed.


GP4 — Comparing Two Sample Sizes (Variants A–E)

Variant A (μ = 100, σ = 20, n₁ = 16 vs. n₂ = 100):

Variant B (μ = 60, σ = 15, n₁ = 9 vs. n₂ = 225):

Variant C (μ = 200, σ = 40, n₁ = 4 vs. n₂ = 64):

Variant D (μ = 500, σ = 50, n₁ = 25 vs. n₂ = 100):

Variant E (μ = 30, σ = 6, n₁ = 36 vs. n₂ = 144):

Section 6 — Independent Practice Solutions

IP1 — CLT Conditions: Variant Key

Variant A (right-skewed electricity bills, n = 35): CLT applies — n = 35 ≥ 30.

Variant B (normal heights, n = 10): Exactly normal — population is normal, any n works.

Variant C (Poisson toll counts, n = 20): Caution — n < 30, skewed Poisson, approximation unreliable.

Variant D (bimodal satisfaction, n = 50): CLT applies — n = 50 ≥ 30.

Variant E (extreme right-skew savings, n = 25): Caution — n < 30 + extreme skew. Use a larger sample or non-parametric methods.


IP2 — Generator Problems

Generated problems vary by run. The solution method is always:

  1. Compute .
  2. Compute .
  3. Look up in the z-table; take the complement if needed.

IP3 — Sample Size Comparison: Generator

Solution method:

  1. Compute and . The larger n always gives smaller SE.
  2. For part (b), use SE₂ to compute the z-score and look up the probability.

IP4 — CLT + Probability (Variants A–E)

Variant A (Uniform[20, 80], n = 36, σ ≈ 17.32, P(47 < x̄ < 53)):

Variant B (right-skewed, μ = 30, σ = 8, n = 64, P(28 < x̄ < 32)):

(The 95.4% empirical-rule interval for the sampling distribution.)

Variant C (left-skewed, μ = 90, σ = 15, n = 100, P(x̄ > 92.5)):

Variant D (bimodal, μ = 50, σ = 12, n = 144, P(49 < x̄ < 51)):

Variant E (normal, μ = 165, σ = 6, n = 9, P(x̄ > 167)):


IP5 — Individual vs. Sample Mean

Part (a): Single patient above 130 mmHg: z = (130 − 120)/15 = 0.67. P = 1 − 0.7486 = 0.2514 (about 25%).

Part (b): Sample mean of 36 above 130: SE = 15/6 = 2.5. z = (130 − 120)/2.5 = 4.00. P ≈ 0.000032 (essentially impossible).

Part (c): A single high reading could reflect natural between-person variability. An average of 36 readings above 130 cannot be explained by random variation — it almost certainly signals a real systematic shift. The SE shrinks by a factor of compared to , making extreme averages far rarer than extreme individuals. This is why clinical studies use groups, not single measurements.

Section 7 — Mastery Check Solutions

Feynman Test — Suggested Answer Elements

A strong answer will include all four points:

  1. Sampling distribution: The sampling distribution of is the theoretical distribution of all possible sample means from samples of size n. Each sample gives a different ; the sampling distribution describes the full range of values and their probabilities.

  2. Parameters: Mean = (same as population mean — is unbiased). Standard error = (shrinks with larger n).

  3. CLT: For n ≥ 30 (rule of thumb), the sampling distribution is approximately normal regardless of population shape. Exception: if the population itself is normal, any n works.

  4. Why it matters: The CLT is the foundation for all inference — it lets us compute how unusual our sample result is, which is needed for confidence intervals, hypothesis tests, and more.


Apply Question

Population: μ = 400, σ = 60, n = 100 (right-skewed).


Error Analysis

The error: The student used in the denominator instead of .

Correct solution: SE = 10/5 = 2. z = (52 − 50)/2 = 1.00. P(x̄ > 52) = 1 − 0.8413 = 0.1587.

The student’s answer (0.4207) is the probability for a single observation exceeding 52 — not for a sample mean of 25 observations. The SE formula divides by to account for the reduced variability of averages compared to individual values.

Section 8 — Boss Fight Solutions

Path A: The Analyst

1. Sample mean: Sum of all 25 values = 6,240.0 mL. x̄ = 6240/25 = 249.60 mL.

2. Standard error: = 1.00 mL.

3. Probability:

4. Interpretation: A sample mean of 249.60 mL occurs about 34% of the time when the machine is working correctly. This is not unusual — no evidence of underfilling. An alarm would be warranted if the probability were below ~5% (z ≤ −1.645), which would require x̄ ≤ 248.36 mL.


Path B: The Architect

1. . . = 0.0228 ≈ 2.3% false alarm rate.

2. Minimum n for P < 1%:

Round up: n = 22 tablets.

3. Comparison — individual vs. sample mean: Single tablet: z = (490 − 500)/20 = −0.50. P(X < 490) = 0.3085. About 31% of tablets fall below 490 mg even when the process is on target — far too many false alarms for individual monitoring. The sample mean approach is dramatically more precise.

4. Cost implications: Larger n reduces false alarm rate and improves sensitivity, but each additional tablet measured costs time and material. The optimal n balances: cost of false alarms (unnecessary line stoppages) vs. cost of missed defects (under-dosed tablets reaching patients). For pharmaceuticals, the cost of a miss is severe, justifying larger n and tighter controls.

Section 9 — Challenge Problem Solutions

C1 — Minimum Sample Size (Variants A–E)

General method: Set , solve for n: .

VariantσTarget SEMinimum nCLT?
A24364Yes (64 ≥ 30)
B504157Yes
C181.5144Yes
D302225Yes
E402.5256Yes

C2 — Finite Population Correction

N = 200 employees, σ = 8 years, n = 50.

n/N = 50/200 = 25% > 5% → FPC required.

The FPC reduces the SE by about 13%. As n → N (sampling the whole population), FPC → 0 and SE → 0 — perfect knowledge, no uncertainty.


C3 — Proof Sketch

Step 1: . Pulling out of the variance: it becomes outside.

Step 2: Independence of observations means .

Combined: . Taking the square root: . ∎

This derivation shows that the SE formula is an exact mathematical result (not a rule of thumb) — it follows directly from the definition of variance and the independence of random draws.