Confidence Intervals for a Proportion

Every election season, pollsters publish results like this: “Candidate A leads with 54% support — margin of error ±3 percentage points, 19 times out of 20.” That last phrase — “19 times out of 20” — is not a throwaway disclaimer. It is the heart of a 95% confidence interval. But where does the ±3 come from? How do pollsters know what sample size they need to hit that precision? And what exactly are they claiming when they publish an interval?

In INF-2, you built confidence intervals for a population mean using the formula . This lesson keeps the same skeleton — point estimate ± margin of error — but swaps in a new statistic: the sample proportion . The central challenge is that proportions have their own standard error formula, and that formula introduces a subtlety that trips up almost everyone the first time.

Here is the real-world question we will be able to answer by the end of this lesson: A Léger Research poll contacts 1,000 Canadians and finds that 540 support a proposed climate policy. How confident can we be in the true level of support across all Canadians, and how many people would we need to survey to narrow that uncertainty further?

After this lesson, you will be able to:

Compute the sample proportion and identify it as a point estimate for the population proportion
Verify the conditions required for the z-interval to be valid ( and )
Construct a two-sided confidence interval for using
Interpret a confidence interval correctly — in terms of the method’s long-run reliability, not the probability that falls in any single interval
Determine the minimum sample size needed to achieve a target margin of error, both with and without a prior estimate of

If INF-2 gave you the blueprint, this lesson gives you a new set of tools for a different kind of question. The math is closely parallel — which means most of what you already know transfers directly.

Confidence intervals for proportions are built on the same “Point Estimate ± Margin of Error” framework you mastered in INF-2.

From INF-2: Critical Z-Values. The same values ( for 95%) apply whenever we use the normal distribution as our model.
From INF-2: The Quadruple Rule. To cut the margin of error in half, you must multiply the sample size by four.
Percentages to Proportions: is the same as . You must use the decimal form in all formulas.
Square Root Properties: . You will be working with , so ensure you are taking the square root of the entire result.

Retrieval Checkpoint

A poll reports a 95% confidence interval for a proportion as . What is the margin of error () in this study?

Success Factor:

In this lesson, we use only if the sample size is “large enough.” If or , the normal distribution is a poor model and these methods cannot be used. Always check your conditions first.

Retrieval Warm-up — from earlier lessons

A study reports , , . What is the margin of error for a 95% confidence interval for the population mean?

You read a news article: “A new 95% CI shows the government approval rating is between 42% and 50%.” A researcher says this is the sample size determination problem — she wants to cut the margin of error in half before the next election. By what factor must she multiply the sample size?

Navigation tip: Seven concepts live in this section. They build on each other in order — C1 through C4 give you the interval formula, C5 tells you when to use it, C6 tells you how to talk about it, and C7 adds the sample-size tool. If you already know INF-2 well, C1–C4 will go quickly.

C1 — The Sample Proportion

Suppose we want to know what fraction of Montréal adults have a library card. We can’t ask all 1.8 million — so we sample people and count how many () have one. The sample proportion is simply that fraction:

Sample Proportion

If individuals in a random sample of size have a characteristic of interest, the sample proportion is:

(read “p-hat”) estimates the unknown population proportion . must be a count (a whole number); is the sample size.

Notice the notation carefully: is the true proportion in the whole population — unknown, fixed, a parameter. is what we compute from our data — known, varies sample to sample, a statistic. This distinction is going to matter a great deal when we write the standard error formula.

and are not the same thing. is the true population proportion — it exists, but we don’t know its value (that’s why we’re building an interval). is our best guess from the data. We will always use in our calculations, never — because we don’t have .

The visualization below makes this concrete: stays fixed (and hidden — you don’t get to know it) while each new sample produces a different . Watch the values scatter around the one fixed .

Figure: The dashed line is the true proportion p — one fixed value that never moves (and is usually unknown, so it starts hidden). Each dot is a different sample's p̂ = x/n. Draw repeatedly: the dots scatter and cluster around the line. p̂ changes every sample; p does not.

C2 — The Standard Error of

In INF-2, the standard error of was . Where did that come from? From the variance of a sum of random variables. The same idea applies here — but for a proportion.

Think of each sampled person as a Bernoulli trial: success (has the characteristic) with probability , failure with probability . The sample count is binomial. Recall from PR-4 that a binomial random variable has variance . Dividing by to get the proportion gives variance . Taking the square root gives the standard error:

Standard Error of a Proportion

The theoretical standard error of is:

Since is unknown, we substitute to get the estimated standard error:

Notice: we estimate an estimate. We use to estimate for the SE formula itself. This works fine in practice, but it’s one more reason the conditions check (C5) matters — the approximation is only reliable when is large enough.

Never write in a calculation. You do not know . The formula that goes into actual computations always uses : . Only the theoretical expression uses .

C3 — Margin of Error

The margin of error is the ”±” part of the interval. It answers: “How far from do we need to reach to be confident we’ve captured ?”

Margin of Error for a Proportion

where is the critical value for the desired confidence level (1.645 for 90%, 1.96 for 95%, 2.576 for 99%).

The margin of error is the half-width of the interval — not the full width. A 95% CI with spans 0.06 units in total. When a poll reports “margin of error ±3 points,” that ±3 is exactly .

The margin of error is half the interval width. If , the interval is , which has total width 0.08, not 0.04. The ”±” already signals this — but it’s easy to report as the width when writing up results.

C4 — The Confidence Interval Formula

Two-Sided Confidence Interval for a Proportion

A confidence interval for the population proportion is:

which gives the interval .

Conditions must be checked before applying this formula (see C5).

The structure is identical to INF-2: point estimate ± (critical value × standard error). Only the point estimate and SE formula have changed. Everything else — the z* values, the interpretation logic, the sample size approach — carries over directly.

Figure: Drag the sliders to see how n and p̂ affect the interval width. Switch confidence levels to see the width–confidence tradeoff. The row at the bottom always shows what happens when you quadruple the sample size: E is halved.

C5 — Conditions for the z-Interval

The CI formula works because is approximately normally distributed when the sample is large enough (this follows from the CLT applied to a proportion). “Large enough” has a specific meaning here:

Conditions for the z-Interval for a Proportion

Before computing a CI for , verify both:

— at least 10 successes in the sample
— at least 10 failures in the sample

Also assume the data come from a random sample (or can be treated as one), and that of the population size (so observations are approximately independent).

Intuitively: if almost everyone in your sample is a “success” (say, with ), the distribution of is strongly skewed — the normal approximation breaks down. The conditions and ensure there are enough of both outcomes for the CLT to kick in symmetrically. Requiring at least 10 of each (rather than 5) provides a meaningful buffer against skewness.

Figure: Exact binomial probabilities for p̂ (bars) vs. the normal approximation (curve). When n·p or n·(1−p) falls below 10, the bars are visibly asymmetric and the curve is a poor fit — the conditions badges flip red. Increase n or move p toward 0.5 to watch them flip green.

Check conditions with , not . You’re checking whether your observed sample has enough successes and failures. Use and — these are literally the observed count of successes and failures in your sample (i.e., and ).

C6 — Correct Interpretation

This is the concept most students get wrong — even after getting the arithmetic right. The issue is subtle but important.

Imagine running 1,000 polls, each on a fresh random sample of the same size. Each poll gives a different , and therefore a different interval. A 95% CI procedure guarantees that 950 of those 1,000 intervals will contain the true . The other 50 will miss.

Now you run your poll. You get one specific interval — say, (0.48, 0.56). Either is in that interval or it isn’t. There is no randomness left: is a fixed (unknown) number, and your interval is a fixed pair of numbers.

Figure: Each bar is a confidence interval built from a different random sample of size n. Teal bars capture the fixed true p (dashed line); red bars miss it. The line never moves — only the intervals vary. Over many repetitions, approximately C% of all such intervals will contain p.

Correct CI Interpretation

“We are 95% confident that the true population proportion lies between [lower] and [upper].”

This means: the method we used to build this interval captures the true in 95% of all possible samples of this size.

“There is a 95% probability that is between [lower] and [upper].” This statement is incorrect. is not random — it is a fixed population parameter. It doesn’t have a probability of being in a range; it either is or isn’t. The 95% refers to the procedure, not to the probability that is in any one specific interval.

C7 — Sample Size Determination

Suppose you want to design a poll. You want your margin of error to be at most at a given confidence level. How large does your sample need to be? Solve the margin-of-error formula for :

Sample Size Formula for a Proportion

To achieve a margin of error of at most :

where is your best prior estimate for . If no prior estimate is available, use — this maximizes and gives the largest (most conservative) sample size.

Always round up to the next whole number.

Why does give the largest sample? Because is maximized at (you can verify: , while and ). Using ensures you won’t undershoot the required sample size, no matter what the true turns out to be.

Figure: The curve shows p*(1−p*) — the factor that drives sample size — as a function of the prior estimate p*. The teal dot at p* = 0.5 marks the peak (0.25): choosing p* = 0.5 guarantees the largest required n, covering all possible true p. The red dot shows your current p*. The shaded zone shows every true p value that would require more samples than your current choice provides — drag the slider toward 0.5 to watch the zone disappear.

Always round sample size UP. If the formula gives 384.16, you need 385 people — not 384. Rounding down means you’ve committed to a margin of error larger than the target, which defeats the purpose of the calculation.

Looking ahead: In inf-6, you will use the proportion — a specific hypothesized value — to test a claim. The notation (observed) vs. (hypothesized) will become critical. Start noticing this distinction now.

Three examples with progressively less scaffolding. Work through each one at your own pace — the first is fully narrated so you can see the complete thought process.

Example 1 — A Transit Policy Poll (Fully Worked)

Problem: A Léger poll contacts 600 Quebecers at random. Of those, 312 say they support a proposed new transit line. Build a 95% confidence interval for the true proportion of Quebec residents who support the policy.

Step 1: Identify what we know.

(sample size)
(number of supporters in the sample)
Confidence level = 95%, so

Step 2: Compute the sample proportion.

So 52% of our sample supports the policy. This is our point estimate for .

Step 3: Check the conditions.

We need and :

Conditions are met. (Notice: and — the raw counts themselves.)

Step 4: Compute the standard error.

Step 5: Compute the margin of error.

Step 6: Build the interval.

Lower bound: Upper bound:

Step 7: Interpret in context.

We are 95% confident that the true proportion of Quebec residents who support the transit policy is between 48.0% and 56.0%.

Reality check: The margin of error is ±4.0 percentage points, consistent with what you’d see in a real Léger poll of this size. A sample of 600 gives reasonably precise estimates for proportions near 0.5 — but the interval still spans 8 full percentage points. To cut that in half, you’d need to quadruple the sample size.

Example 2 — Quality Control (Partially Scaffolded)

Problem: A quality-control inspector randomly samples 80 juice bottles from a production run. She finds 12 with a filling defect. Build a 90% confidence interval for the true defect rate.

Setup: , , confidence level = 90%.

The defect rate in our sample is . At 90% confidence, we use . Do you expect the 90% interval to be wider or narrower than a 95% interval on the same data?

Step 1: Sample proportion.

Step 2: Check conditions.

Step 3: Standard error and margin of error.

Step 4: Interval.

Interpretation: We are 90% confident that between 8.4% and 21.6% of all bottles produced in this run have a defect.

Answer to the prediction: Narrower. Lower confidence → smaller → smaller . The 95% interval on the same data would be — noticeably wider.

Example 3 — Designing a CEGEP Survey (Minimally Scaffolded)

Problem: A researcher wants to estimate the proportion of CEGEP students in Quebec who work more than 15 hours per week during the semester. She wants a margin of error of at most 4 percentage points at 95% confidence. No prior estimate of the proportion is available. How many students must be surveyed?

Hint: Use the sample size formula with (worst case), , and . Solve for and round up.

Show Solution

Round up: students.

Interpretation: To guarantee a margin of error of at most 4 percentage points at 95% confidence — regardless of the true proportion — we must survey at least 601 students.

Common mistake: Rounding 600.25 down to 600. Always round up — 601 guarantees the target precision; 600 does not.

Work through each problem step-by-step. Use the dropdowns to make key decisions — each one targets a place where students commonly go wrong.

Problem 1 — Sample Proportion and Condition Verification (C1)

A random sample of 400 university students was asked whether they use a budget app. Of those, 148 said yes.

(a) What is , the sample proportion who use a budget app?

(b) Do the conditions for the z-interval hold?

A survey of 250 households found that 75 have a composting bin.

(a) What is ?

(b) Do the conditions for the z-interval hold?

In a clinical trial, 18 of 40 participants reported side effects.

(a) What is ?

(b) Do the conditions for the z-interval hold?

Problem 2 — Standard Error and Interval Construction (C2)

A random sample of 400 university students found that 148 use a budget app (, ). Build a 95% confidence interval for the true proportion.

(a) Which formula gives the standard error of ?

(b) What is the 95% CI for ? (SE = √(0.37 × 0.63 / 400) = √(0.000583) ≈ 0.02414; E = 1.96 × 0.02414 ≈ 0.047)

From the composting survey: , , 90% confidence.

(a) Which formula gives the standard error?

(b) What is the 90% CI? (SE = √(0.30 × 0.70 / 250) = √(0.00084) ≈ 0.02898; E = 1.645 × 0.02898 ≈ 0.0477)

From the clinical trial: , , 99% confidence.

(a) Which formula gives the standard error?

(b) What is the 99% CI? (SE = √(0.45 × 0.55 / 40) = √(0.006188) ≈ 0.07866; E = 2.576 × 0.07866 ≈ 0.2026)

Problem 3 — Practical Interpretation of Confidence (C6)

A researcher surveys 500 adults and finds that 310 support stricter food labelling laws. The resulting 95% CI is (0.577, 0.663).

Which of the following is the correct interpretation?

Problem 4 — Determining Required Sample Size (C7)

A public health agency wants to estimate the proportion of adults in a city who have received this year’s flu vaccine. They want a margin of error of at most 3 percentage points () at 95% confidence. A previous survey found about 40% vaccination coverage.

(a) Which value of should be used in the sample size formula?

(b) What is the required sample size? (Formula: )

These problems have no step-by-step guidance — work through them on your own, then check the solution. Concepts are interleaved across the four problems.

Problem 1 — Build a Confidence Interval (Generative)

Problem 2 — Margin of Error and Interpretation

A national poll of 900 randomly selected Canadians finds that 513 support a proposed national pharmacare program. The resulting 95% CI is (0.537, 0.603).

(a) What is the margin of error, in percentage points?

(b) A news headline says: “Majority of Canadians back pharmacare (probability 95%).” What is wrong with this headline?

Show Solution

(a) . The interval is , so . Margin of error: 3.3 percentage points.

(b) The headline says “probability 95%” as if is random. But is a fixed (unknown) parameter — there is no probability that it falls in any given interval. The correct language is: “We are 95% confident that the true proportion is between 53.7% and 60.3%.” The 95% refers to the procedure’s long-run reliability, not the probability for this specific interval.

(c) To halve : new . Using :

Round up: n = 3,460. (To halve E, you must quadruple the sample size.)

A university surveys 500 students and finds 185 report food insecurity. The 90% CI is (0.335, 0.405).

(a) What is the margin of error, in percentage points?

(b) A report states: “There is a 90% chance that between 33.5% and 40.5% of all students are food insecure.” Identify the error.

Show Solution

(a) . . Margin of error: 3.5 percentage points.

(b) The error: treating the CI as a probability statement about . The true proportion is fixed; saying “90% chance” implies is random. Correct: “We are 90% confident the true proportion lies between 33.5% and 40.5%.”

(c) , , :

Round up: n = 1,578.

A sample of 300 employees at a large company finds 81 experiencing burnout symptoms. The 99% CI is (0.208, 0.332).

(a) What is the margin of error?

(b) HR states: “We’re 99% sure that between 20.8% and 33.2% of employees are burned out.” Is this a valid interpretation?

Show Solution

(a) . . Margin of error: 6.2 percentage points.

(b) The phrase “99% sure” is colloquially acceptable if it means “the procedure captures the true proportion 99% of the time.” However, saying “we’re 99% sure” about this specific interval implies is random. More precise: “We are 99% confident the true proportion lies in this interval.”

(c) , , :

Round up: n = 1,454.

Problem 3 — Determine the Required Sample Size (Generative)

Problem 4 — When the Conditions Are Not Met

A researcher samples 30 rare-book collectors and finds 2 who own a first edition of a specific novel. She wants to build a 95% CI for the proportion of all rare-book collectors who own this edition.

(a) Check whether the z-interval conditions are met.

(b) What should the researcher do, given the outcome of the conditions check?

Show Solution

(a) .

— fails (< 10)
— passes

The first condition is not met. The z-interval is not appropriate.

(b) Options: (1) Collect a larger sample until . (2) Use an exact method (e.g., Clopper-Pearson interval) designed for small counts. (3) Use the Wilson score interval, which handles extreme better. The standard z-interval should not be reported — it would be unreliable.

A quality inspector samples 20 ultra-high-precision components and finds 1 defective. She wants a 95% CI for the defect rate.

(a) Check the z-interval conditions.

(b) What should she do given the result?

Show Solution

(a) .

— fails (< 10)
— passes

Condition fails — z-interval is not valid.

(b) The inspector should inspect a larger sample (e.g., 100+ components) or use an exact binomial confidence interval. With n = 20 and only 1 defective, the normal approximation is too rough to be reliable.

A biologist samples 15 nesting sites and finds 14 occupied by the target species. She wants a CI for the occupancy rate.

(a) Check the z-interval conditions.

(b) What is the conclusion, and what should she do?

Show Solution

(a) .

— passes
— fails (< 10)

The second condition fails. The z-interval is not appropriate.

(b) The distribution of is strongly left-skewed here (nearly all sites occupied, very few “failures”). The biologist needs more sites in the sample, or should use an exact method. The z-interval would be unreliable and likely produce an upper bound above 1, which is impossible for a proportion.

Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — Sampling Distribution Concept (INF-1)

A city-wide standardized reading test has points and points. An educational researcher draws random samples of students from a large school.

(a) What is the mean and standard error of the sampling distribution of ? (b) Using , find .

Show Solution

(a) points. points.

Since , the CLT guarantees is approximately .

(b)

About 15.9% of random samples of 36 students would average above 76 points — this is within normal sampling variability.

Review Problem 2 — z-CI Construction and Sample Size (INF-2)

A food safety agency surveys 80 randomly selected restaurants and finds out of 100 on a hygiene checklist, with . Population SD unknown.

(a) Construct a 90% CI for the true mean hygiene score. (b) How many restaurants must be sampled to reduce the margin of error to at most 1.5 points at 90% confidence, using as an estimate of ?

Show Solution

(a) ; use large-sample approximation with .

(b)

At least 107 restaurants must be sampled to achieve a margin of error of at most 1.5 points at 90% confidence.

No hints. No guided steps. These questions measure whether the core ideas have actually landed. Take your time with each one — especially the Feynman test.

Question 1 — Feynman Test

A friend who missed this lesson asks you: “The formula for the standard error of a proportion involves — but you said is what we’re trying to estimate. So aren’t we using an unknown to find an unknown? Doesn’t that break the whole thing?”

Explain your answer below in plain language, as if talking to your friend. Address both why we use and what limits that introduces.

0 / 1000

Show a model answer

You’re right that we don’t know — that’s exactly why we’re building the interval. The SE formula theoretically requires , but since is unknown, we substitute our best estimate: . This works because when the sample is large enough, is close to , so the estimated SE is close to the true SE.

The catch: this substitution introduces extra uncertainty — especially when is far from (which tends to happen with small samples or extreme proportions). That’s exactly why we check the conditions and first. When those hold, the approximation is good enough for practical purposes. When they don’t, the interval can be meaningfully misleading, and we need a different method.

Question 2 — Vegetable Intake CI

The National Institute of Nutrition surveys 450 adults and finds that 180 consume fewer than two servings of vegetables per day. They want to estimate the true proportion with 99% confidence.

(a) Which formula should be used for the standard error?

(b) Verify conditions, then compute the 99% CI.

Show Solution

, , .

Conditions: ✓; ✓.

Interpretation: We are 99% confident that between 34.1% and 45.9% of all adults consume fewer than two servings of vegetables per day.

Question 3 — Error Analysis

A student computes a 95% CI for a proportion as from a sample of 200 people. They write the following conclusion:

“There is a 95% probability that the true population proportion is between 0.42 and 0.58. Since 0.50 is inside this interval, we can say the population is evenly split with 95% certainty.”

Identify all errors in this conclusion and restate it correctly.

Show Solution

Error 1 — Probability statement: “There is a 95% probability that is between 0.42 and 0.58.” This is wrong. is a fixed number — it either is or isn’t in the interval. The 95% refers to the method: in 95% of all random samples of this size, the constructed interval will contain . For any specific interval, there is no probability to assign.

Error 2 — Inferring the value of p: “0.50 is inside the interval, so the population is evenly split.” The CI tells us could plausibly be anywhere in (0.42, 0.58). It does not mean , or that the population is evenly split — only that 0.50 is a plausible value that we cannot rule out.

Correct restatement: “We are 95% confident that the true population proportion lies between 0.42 and 0.58. This interval includes 0.50, meaning we cannot rule out an even split — but we also cannot conclude that one exists.”

Self-Assessment

How confident are you with the concepts from this lesson?

Still confusedReady for the Boss Fight

If your confidence is below 60%, focus on revisiting Section 3 (Core Concepts) and re-doing Examples 1–2 before the Boss Fight. The Boss Fight requires all eight concepts working together.

Two paths. Same difficulty. Different thinking style. Choose the one that feels more natural to you — there is no wrong answer. Both paths use every concept from this lesson.

🔬 The Analyst

You have data in hand. Work through it to compute and interpret intervals, and advise the city based on what the numbers say.

🏗️ The Architect

No data yet. Design the study, determine what you need, and give the CEGEP a research plan with real statistical justification.

🔬 Path A: The Analyst — Montréal Restaurant Inspections

The City of Montréal’s food inspection team has completed a round of surprise inspections. Out of 120 randomly selected restaurants, 47 had at least one critical health violation. City officials want to use this data to make public statements and plan future inspections. Your job is to advise them — with numbers.

Task 1 — Verify the Conditions

Before computing any interval, confirm that the z-interval is valid. Show your work and state your conclusion clearly.

Show Solution — Task 1

Both conditions hold. The z-interval is appropriate.

Task 2 — Construct a 95% CI for the Violation Rate

Compute the 95% confidence interval for the true proportion of Montréal restaurants with at least one critical violation. Round to 4 decimal places.

Show Solution — Task 2

We are 95% confident that between 30.4% and 47.9% of Montréal restaurants have at least one critical violation.

Task 3 — Evaluate the City’s Claim

The city communications team wants to issue a press release stating: “Fewer than 40% of Montréal restaurants have critical violations.” Does your 95% CI support this claim? Explain.

Show Solution — Task 3

The 95% CI is (0.304, 0.479). The upper bound of 47.9% is well above 40%. Since 40% falls inside the confidence interval, we cannot rule out that the true violation rate is 40% or higher. The data do not support the city’s claim at 95% confidence.

In fact, the point estimate itself () is barely below 40% — and given the interval, values above 40% are entirely plausible. Issuing the press release as stated would be misleading.

Task 4 — Planning Future Inspections

The inspection department wants to re-survey next year with a margin of error of at most 2 percentage points at 95% confidence, using this year’s as a prior estimate. How many restaurants must be inspected?

Show Solution — Task 4

, , :

Round up: n = 2,286 restaurants.

Note: the current sample of 120 restaurants gave a margin of error of ~8.7 percentage points. Getting to ±2 points requires roughly 20× more inspections — a significant resource commitment.

Reflection: What would you tell a journalist asking about restaurant safety in Montréal? Was the city’s claim supported? What additional data would make your advice more reliable? Write 2–3 sentences in the space below.

0 / 600

🏗️ Path B: The Architect — CEGEP Tutoring Program Study

Collège de Rosemont is considering launching a free peer-tutoring program in mathematics. Before committing the budget, administration wants to know what proportion of students would actually use it. They have no prior data and a budget for at most 300 interviews. Your job is to design the study and advise the administration.

Task 1 — Worst-Case Margin of Error

With no prior estimate available, use . What is the margin of error achievable with exactly 300 interviews at 95% confidence?

Show Solution — Task 1

, , :

With 300 interviews, the margin of error is approximately ±5.7 percentage points.

Task 2 — Using a Pilot Study

Before the main survey, a small pilot of 40 students found 22 who said they would use the program (). Recalculate the margin of error for using this prior estimate. Is it better or worse than the worst-case estimate? Why?

Show Solution — Task 2

, , :

Using : . Using : .

The improvement is tiny — because is very close to . The pilot estimate is near 0.5, where the curve is flat. When is much closer to 0 or 1, using a prior estimate saves significantly more sample size.

Task 3 — Hitting a Target Precision

Administration decides they need the margin of error to be at most 3 percentage points at 95% confidence. Using the pilot estimate , how many interviews are needed? Is this within the budget of 300?

Show Solution — Task 3

, , :

Round up: n = 1,057 interviews.

This is well above the budget of 300. To achieve ±3 points at 95% confidence, the college would need to more than triple its budget. Administration faces a choice: accept the wider ±5.7-point margin with 300 interviews, or increase the budget.

Reflection: What recommendation would you make to the administration? Should they launch the program, gather more data, or adjust the confidence threshold? Write 2–3 sentences below.

0 / 600

Optional stretch problems — these go beyond the lesson objectives. They’re here for students who want to push further. C1 previews a more robust method; C2 builds deep intuition; C3 requires creative multi-step reasoning.

Challenge 1 — The Wilson Score Interval

The standard z-interval () has a known weakness: when is close to 0 or 1, or when is small, it can produce intervals outside [0, 1] and performs poorly even when conditions are technically met. The Wilson score interval is more robust. It is defined as:

A sample of 20 patients finds 2 with a rare drug reaction (). Using 95% confidence ():

(a) Compute the standard z-interval. Does it stay within [0, 1]?

(b) Compute the Wilson interval. Compare the two.

Show Solution

Standard z-interval:

Interval:

The lower bound is negative — impossible for a proportion. This confirms the standard interval fails here (the conditions were not met).

Wilson interval:

With , , , :

Numerator center:

Denominator:

SE term:

Wilson interval:

Lower: ; Upper:

Wilson: — stays within [0,1] and is more meaningful than the standard interval.

A test of 15 items finds 14 conforming (). Using 95% confidence:

(a) Compute the standard z-interval. Does it stay within [0, 1]?

(b) Compute the Wilson interval. Compare the two.

Show Solution

Standard z-interval:

Interval:

Upper bound exceeds 1 — impossible. Conditions failed ().

Wilson interval:

; ,

Center:

Denom:

SE term:

Wilson: → Lower: ; Upper:

Wilson: — entirely within [0,1] and far more informative.

A poll of 8 people finds 1 planning to vote in a by-election (). Using 95% confidence:

(a) Compute the standard z-interval. Does it stay within [0, 1]?

(b) Compute the Wilson interval. Compare the two.

Show Solution

Standard z-interval:

Interval:

Lower bound is negative — invalid. Conditions fail ().

Wilson interval:

Center:

Denom:

SE term:

Wilson: → Lower: ; Upper:

Wilson: — bounded within [0,1] and usable despite the tiny sample.

Challenge 2 — The Shape of the Margin of Error

For a fixed and confidence level of 95%, the margin of error is:

This is a function of .

(a) Compute for .

(b) At what value of is maximized? Why does this make sense?

Show Solution

(a) Using :


0.1	0.09	0.0294
0.2	0.16	0.0392
0.3	0.21	0.0449
0.4	0.24	0.0480
0.5	0.25	0.0490
0.6	0.24	0.0480
0.7	0.21	0.0449
0.8	0.16	0.0392
0.9	0.09	0.0294

(b) is maximized at , where . This makes sense: is maximized at 0.5 because that’s where a Bernoulli random variable has maximum variance. A proportion near 0.5 means successes and failures are equally unpredictable — maximum uncertainty.

(c) Incorrect. Using gives , which is not the worst case. The worst case is always (giving 0.25). For between 0.5 and 0.7, the product ranges from 0.21 to 0.25 — using would underestimate the required sample size for any true in that range. Safe means conservative: use unless you’re very confident the true proportion is far from 0.5.

Challenge 3 — Two Polls, One Question (Generative)

Complete, step-by-step solutions for all problems in Sections 5–9 are available on the solutions page. Solutions include worked arithmetic, common mistakes to watch for, and interpretation guidance.

View Full Solutions →

If you’re stuck: Re-read the relevant Core Concept in Section 3, then find the Worked Example that maps to that concept (e.g., Example 1 maps to Concept 1). The solutions page shows the reasoning behind every step, not just the final answer.

Quick-Reference Formulas

Sample Proportion:

Standard Error of :

Confidence Interval for :

Required Sample Size: (Use for a conservative estimate if no prior estimate is available. Always round up to the next whole number)

Condition	Rule to check
Randomness	Was the sample randomly selected?
Independence	Is of the population?
Success/Failure	Are both and ?

INF-4: Confidence Intervals for a Proportion

Section 1: Introduction

Section 2: Prerequisites

Section 3: Core Concepts

C1 — The Sample Proportion

Sample Proportion

C2 — The Standard Error of

Standard Error of a Proportion

C3 — Margin of Error

Margin of Error for a Proportion

C4 — The Confidence Interval Formula

Two-Sided Confidence Interval for a Proportion

C5 — Conditions for the z-Interval

Conditions for the z-Interval for a Proportion

C6 — Correct Interpretation

Correct CI Interpretation

C7 — Sample Size Determination

Sample Size Formula for a Proportion

Section 4: Worked Examples

Example 1 — A Transit Policy Poll (Fully Worked)

Example 2 — Quality Control (Partially Scaffolded)

Example 3 — Designing a CEGEP Survey (Minimally Scaffolded)

Section 5: Guided Practice

Problem 1 — Sample Proportion and Condition Verification (C1)

Problem 2 — Standard Error and Interval Construction (C2)

Problem 3 — Practical Interpretation of Confidence (C6)

Problem 4 — Determining Required Sample Size (C7)

Section 6: Independent Practice

Problem 1 — Build a Confidence Interval (Generative)

Problem 2 — Margin of Error and Interpretation

Problem 3 — Determine the Required Sample Size (Generative)

Problem 4 — When the Conditions Are Not Met

Mixed Review — Retrieval from Earlier Lessons

Review Problem 1 — Sampling Distribution Concept (INF-1)

Review Problem 2 — z-CI Construction and Sample Size (INF-2)

Section 7: Mastery Check

Question 1 — Feynman Test

Question 2 — Vegetable Intake CI

Question 3 — Error Analysis

Self-Assessment

Section 8: Boss Fight

🔬 The Analyst

🏗️ The Architect

🔬 Path A: The Analyst — Montréal Restaurant Inspections

Task 1 — Verify the Conditions

Task 2 — Construct a 95% CI for the Violation Rate

Task 3 — Evaluate the City’s Claim

Task 4 — Planning Future Inspections

🏗️ Path B: The Architect — CEGEP Tutoring Program Study

Task 1 — Worst-Case Margin of Error

Task 2 — Using a Pilot Study

Task 3 — Hitting a Target Precision

Section 9: Challenge Problems

Challenge 1 — The Wilson Score Interval

Challenge 2 — The Shape of the Margin of Error

Challenge 3 — Two Polls, One Question (Generative)

Section 10: Solutions Reference

Quick-Reference Formulas