EN FR

INF-4 Solutions: Confidence Intervals for a Proportion

Solutions Reference · ← Back to Lesson INF-4

Section 5 — Guided Practice Solutions

GP-1 — Compute \( \hat{p} \) and Check Conditions (Variants 0–2)

Variant 0 (budget app survey, n = 400, x = 148):

Variant 1 (composting survey, n = 250, x = 75):

Variant 2 (clinical trial, n = 40, x = 18):

Common mistake: Writing \( \hat{p} = 148 \) without dividing by n, or checking "n ≥ 30" instead of the actual conditions \( n\hat{p} \geq 5 \) and \( n(1-\hat{p}) \geq 5 \).


GP-2 — Compute the Standard Error and CI Endpoints (Variants 0–2)

Variant 0 (\( \hat{p} = 0.37 \), n = 400, 95% confidence):

\[ \text{SE} = \sqrt{0.37 \times 0.63 / 400} = \sqrt{0.000583} \approx 0.02414, \quad E = 1.96 \times 0.02414 \approx 0.0473 \]

\[ \text{CI} = 0.37 \pm 0.047 = (0.323,\; 0.417) \]

Variant 1 (\( \hat{p} = 0.30 \), n = 250, 90% confidence):

\[ \text{SE} = \sqrt{0.30 \times 0.70 / 250} \approx 0.02898, \quad E = 1.645 \times 0.02898 \approx 0.0477 \]

\[ \text{CI} = (0.252,\; 0.348) \]

Variant 2 (\( \hat{p} = 0.45 \), n = 40, 99% confidence):

\[ \text{SE} = \sqrt{0.45 \times 0.55 / 40} \approx 0.07866, \quad E = 2.576 \times 0.07866 \approx 0.2026 \]

\[ \text{CI} = (0.247,\; 0.653) \]

Note how wide this interval is — small n (40) combined with high confidence (99%) produces a very imprecise estimate spanning over 40 percentage points.

Common mistakes: (1) Using \( \sqrt{p(1-p)/n} \) with the unknown \( p \) — always use \( \hat{p} \). (2) Adding/subtracting the SE directly instead of \( E = z^* \times \text{SE} \). (3) Using the wrong z* for the stated confidence level.


GP-3 — Interpretation (n = 500, x = 310, 95% CI = (0.577, 0.663))

Correct answer: "We are 95% confident that the true proportion of adults who support stricter labelling lies between 57.7% and 66.3%."

Why the other options are wrong:


GP-4 — Sample Size (vaccination coverage, E = 0.03, 95%, p* = 0.40)

(a) Use \( p^* = 0.40 \) — the prior estimate. Using \( p^* = 0.50 \) would be more conservative but wastes sample size when a good prior estimate is available.

(b)

\[ n = \left(\frac{1.96}{0.03}\right)^2 \times 0.40 \times 0.60 = (65.33)^2 \times 0.24 \approx 1{,}024.4 \]

Round up: \( n = \mathbf{1{,}025} \).

Common mistakes: (1) Using \( p^* = 0.5 \) when a prior estimate is given. (2) Rounding 1024.4 down to 1024 — always round up. (3) Forgetting to square \( z^*/E \) before multiplying by \( p^*(1-p^*) \).

Section 6 — Independent Practice Solutions

IP-1 — Generative: Full CI Computation

Generated by generateProportionCI(). The approach is always the same:

  1. \( \hat{p} = x/n \)
  2. Check \( n\hat{p} \geq 5 \) and \( n(1-\hat{p}) \geq 5 \)
  3. \( \text{SE} = \sqrt{\hat{p}(1-\hat{p})/n} \)
  4. \( E = z^* \times \text{SE} \) (use 1.645 / 1.96 / 2.576 for 90% / 95% / 99%)
  5. CI = \( (\hat{p} - E,\; \hat{p} + E) \)
  6. Interpret: "We are [CL]% confident the true proportion lies between [lower] and [upper]."

IP-2 — Margin of Error and Interpretation (Variants 0–2)

Variant 0 (Pharmacare poll, n = 900, x = 513):

Variant 1 (food insecurity survey, n = 500, x = 185):

Variant 2 (burnout survey, n = 300, x = 81):


IP-3 — Generative: Sample Size

Generated by generateSampleSizeCI(). General approach:

  1. Identify E, confidence level (→ z*), and p* (use 0.5 if no prior)
  2. \( n = (z^*/E)^2 \times p^*(1-p^*) \)
  3. Round UP to the next whole number

IP-4 — When Conditions Are Not Met (Variants 0–2)

Variant 0 (rare-book collectors, n = 30, x = 2): \( \hat{p} = 2/30 \approx 0.067 \). \( n\hat{p} = 2 < 5 \) — condition fails. Do not use the z-interval. Options: larger sample, Clopper-Pearson exact interval, or Wilson interval.

Variant 1 (defective components, n = 20, x = 1): \( \hat{p} = 1/20 = 0.05 \). \( n\hat{p} = 1 < 5 \) — condition fails. Use an exact binomial method or collect more data.

Variant 2 (nesting sites, n = 15, x = 14): \( \hat{p} = 14/15 \approx 0.933 \). \( n(1-\hat{p}) = 1 < 5 \) — condition fails (the "failure" count). The distribution is strongly left-skewed; z-interval would give an upper bound above 1. Use an exact or Wilson interval.

Key lesson from IP-4: Conditions can fail at either extreme — very small \( \hat{p} \) (too few successes) or very large \( \hat{p} \) (too few failures). Always check both conditions.


IP-5 — One-Sided Lower Bound (n = 150, x = 27)

(a) One-sided lower bound — "at least ___%" is a lower-bound claim.

(b) \( \hat{p} = 27/150 = 0.18 \). Conditions: \( 27 \geq 5 \) ✓; \( 123 \geq 5 \) ✓.

\[ \text{SE} = \sqrt{0.18 \times 0.82 / 150} \approx 0.03137, \quad \text{Lower bound} = 0.18 - 1.645 \times 0.03137 \approx 0.128 \]

Statement: "We are 95% confident that at least 12.8% of products contain undisclosed allergens."

(c) Use \( z_{0.05} = 1.645 \) (not 1.96). One-sided puts all 5% in one tail; two-sided splits it 2.5%/2.5%, requiring the larger \( z^* = 1.96 \).

Section 7 — Mastery Check Solutions

Feynman Test — Why \( \hat{p} \) in the SE Formula

Since \( p \) is unknown, we substitute \( \hat{p} \) into the SE formula. This works when the sample is large enough for \( \hat{p} \) to be close to \( p \) (CLT). The conditions \( n\hat{p} \geq 5 \) and \( n(1-\hat{p}) \geq 5 \) check whether the approximation is reliable. If they fail, the SE estimate using \( \hat{p} \) may be too inaccurate to trust.


Apply — 99% CI for Vegetable Consumption (n = 450, x = 180)

(a) Correct SE formula: \( \sqrt{\hat{p}(1-\hat{p})/n} \).

(b) \( \hat{p} = 0.40 \). Conditions: \( 180 \geq 5 \) ✓; \( 270 \geq 5 \) ✓.

\[ \text{SE} = \sqrt{0.40 \times 0.60/450} \approx 0.02309, \qquad E = 2.576 \times 0.02309 \approx 0.0595 \]

\[ \text{CI} = (0.341,\; 0.459) \]

We are 99% confident that between 34.1% and 45.9% of all adults consume fewer than two vegetable servings daily.


Error Analysis — CI = (0.42, 0.58), n = 200

Error 1: "95% probability that p is in the interval" — p is fixed; use "95% confident" instead.

Error 2: Concluding p = 0.50 because 0.50 is inside the interval. The CI says 0.50 is plausible, not that it equals the true value.

Correct statement: "We are 95% confident the true proportion is between 0.42 and 0.58. We cannot rule out an even split, but we cannot conclude one exists either."

Section 8 — Boss Fight Solutions

Path A — The Analyst: Restaurant Inspections (n = 120, x = 47)

Task 1: \( \hat{p} = 47/120 \approx 0.3917 \). Conditions: \( 47 \geq 5 \) ✓; \( 73 \geq 5 \) ✓.

Task 2:

\[ \text{SE} = \sqrt{0.3917 \times 0.6083 / 120} \approx 0.04456, \qquad E = 1.96 \times 0.04456 \approx 0.0873 \]

\[ \text{CI} = (0.304,\; 0.479) \]

Task 3: 40% falls inside (0.304, 0.479) — the data do NOT support the city's claim at 95% confidence.

Task 4:

\[ n = (1.96/0.02)^2 \times 0.39 \times 0.61 = 9{,}604 \times 0.2379 \approx 2{,}285.4 \to \mathbf{2{,}286} \text{ restaurants} \]


Path B — The Architect: CEGEP Tutoring Study

Task 1: \( E = 1.96 \times \sqrt{0.25/300} \approx 1.96 \times 0.02887 \approx 0.0566 \) → ±5.7 percentage points.

Task 2: With \( p^* = 0.55 \): \( E \approx 1.96 \times 0.02872 \approx 0.0563 \) → barely better. Since \( 0.55 \times 0.45 = 0.2475 \approx 0.25 \), the improvement over the worst-case estimate is tiny.

Task 3:

\[ n = (1.96/0.03)^2 \times 0.2475 = 4{,}268.8 \times 0.2475 \approx 1{,}056.5 \to \mathbf{1{,}057} \]

This exceeds the budget of 300 — the target precision is not achievable with available resources.

Task 4: \( \hat{p} = 162/300 = 0.54 \); lower bound \( = 0.54 - 1.645 \times 0.02877 \approx 0.493 \) → "at least 49.3%."

Section 9 — Challenge Problem Solutions

Challenge 1 — Wilson Interval

Full worked solutions are embedded in each variant's "Show Solution" toggle within Section 9.

Key takeaway: The standard z-interval can produce bounds below 0 or above 1 when conditions fail. The Wilson interval always stays in [0, 1] and performs better for small samples or extreme proportions.


Challenge 2 — Margin of Error as a Function of \( \hat{p} \)

(b) \( E \) is maximized at \( \hat{p} = 0.5 \) because \( p(1-p) \) is maximized there — a proportion near 0.5 carries maximum variability.

(c) Using \( p^* = 0.7 \) is not conservative for all \( p \). For \( p \) near 0.5, \( p(1-p) = 0.25 > 0.21 \), so the required n is larger than \( p^* = 0.7 \) predicts. The only safe universal choice is \( p^* = 0.5 \).


Challenge 3 — Two Polls (Generative)

Generated by generateTwoPollsCI(). General approach:

  1. Compute \( \hat{p}_1 \) and \( \hat{p}_2 \) for each poll
  2. Build a CI for each independently at the stated confidence level
  3. Check overlap: if \( [\hat{p}_1 - E_1,\; \hat{p}_1 + E_1] \cap [\hat{p}_2 - E_2,\; \hat{p}_2 + E_2] \neq \emptyset \), the intervals overlap
  4. Overlapping CIs are consistent with each other; non-overlapping suggest a real difference. Note: formally comparing two proportions requires a two-sample z-test, not just CI overlap.