Solutions — Hypothesis Testing for Small-Sample Mean and Proportion

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson INF-6

Section 5: Guided Practice Solutions

▾

Problem 1 — t-Test Decisions (Variants A, B, C)

The key steps: (1) , (2) , (3) compute , (4) bound from the t-table, (5) compare to .

Variant A (nutritionist; n = 10, kcal, s = 280, , two-tailed, α = 0.05):

; .
. , .
df = 9: (one-tail 0.05); (two-tail 0.05). Since : .
Decision: → fail to reject . Insufficient evidence the mean intake differs from 2000 kcal.

Variant B (battery lab; n = 12, h, s = 3.2, , two-tailed, α = 0.05):

; .
. , .
df = 11: (one-tail 0.10); (one-tail 0.05). Since : .
Decision: → fail to reject . Insufficient evidence the mean life differs from 25 h.

Variant C (doctor; n = 16, °C, s = 0.5, , two-tailed, α = 0.05):

; .
. , .
df = 15: (two-tail 0.05). Since : .
Decision: reject . Sufficient evidence the mean temperature differs from 37.0°C.

Common mistakes: (1) using df = n instead of n − 1; (2) reading the one-tail column for a two-tailed test — for two-tailed α = 0.05, use the one-tail 0.025 column; (3) reporting an exact p from the t-table — it gives bounds, so state ”,” not “p = 0.07.”

Problem 2 — Proportion Test Conditions and z Statistic (Variants A, B, C)

Variant A (n = 150, x = 36, , two-tailed, α = 0.05):

Conditions: ✓; ✓.
. . .
. Decision: → fail to reject .

Variant B (n = 80, x = 42, , two-tailed, α = 0.05):

Conditions: ✓; ✓.
. . .
. Decision: → fail to reject .

Variant C (n = 200, x = 82, , two-tailed, α = 0.01):

Conditions: ✓; ✓.
. . .
. Decision: → fail to reject at α = 0.01.

Common mistake: using instead of in the SE denominator. For a proportion test, — the null value goes in the denominator because the SE is computed assuming is true.

Problem 3 — Choosing t vs. z vs. Proportion Test

Scenario A (factory QC; n = 10, g, s = 4, , left-tailed): one-sample t-test (σ unknown, n < 30, df = 9). ; — the suspicion is directional.

Scenario B (pet ownership; n = 120, x = 54, , two-tailed): one-sample z-test for a proportion. Conditions: ✓; ✓. ; .

Common mistake: choosing the t-test for Scenario B because “σ is unknown.” The t-vs-z distinction applies only to tests about a mean. For proportions (conditions met), always use z.

Problem 4 — CI Equivalence: Same Conclusion, Two Methods

Scenario: n = 16, kcal, s = 300, , α = 0.05 (two-tailed), df = 15.

Method 1 — Hypothesis test: . . df = 15: (two-tail 0.05), (two-tail 0.10). Since : → fail to reject .

Method 2 — 95% CI: , . . falls inside → fail to reject ✓.

Why they agree: for a two-tailed test at level α, falls outside the CI iff . When , is inside the 95% CI — both say “fail to reject.”

A “reject” example (Variant C, n = 16, , s = 0.5, ): → , reject. ; falls outside → also reject ✓.

Common mistake: using (or ) instead of when checking whether the CI excludes the null. The check is whether lies inside the interval — the CI is centered at , not .

Section 6: Independent Practice Solutions

▾

Problem 1 — t-Test (Generator)

The five steps:

Hypotheses: ; (two-tailed).
Conditions: σ unknown (s given), n < 30 → t-test; assume approximately normal.
Test statistic: , then . Record .
p-value: bracket between two t-table critical values to bound (e.g., ).
Conclusion: reject if , else fail to reject — in context.

Bounding the p-value: the t-table gives critical values, not exact p-values, so report a range like ”.” That is enough to decide: if the whole range is below α, reject; if it straddles or exceeds α, fail to reject.

Common mistake: using df = n instead of n − 1. Compute df first and write it down before opening the t-table.

Problem 2 — One-Tailed Proportion Test (Variants 0–2)

Variant 0 (medication; n = 100, x = 58, , right-tailed, α = 0.05):

; . Conditions: ✓; ✓.
. . .
. Decision: → fail to reject .

Variant 1 (customer satisfaction; n = 120, x = 30, , left-tailed, α = 0.05):

; . Conditions: ✓; ✓.
. . .
. Decision: → fail to reject .

Variant 2 (social media; n = 150, x = 117, , right-tailed, α = 0.01):

; . Conditions: ✓; ✓.
. . .
. Decision: → fail to reject at α = 0.01 (would reject at α = 0.05).

Common mistake — one-tailed p-values: for a right-tailed test, , not . For a left-tailed test, . The two-tail doubling applies only to two-tailed tests. Match the tail to the direction of .

Problem 3 — Proportion Test (Generator)

Hypotheses: ; matches the scenario direction.
Conditions: and .
Test statistic: , , . SE uses , not .
p-value: right-tailed → ; left-tailed → ; two-tailed → .
Conclusion: reject if , else fail to reject — in context.

The single most-tested concept: the SE in the proportion test uses (the null value), not . The p-value is computed assuming is true, so we use the spread expected if were exactly .

Problem 4 — Find the Error

The generator rotates through three error types:

Error Type 1 — in the denominator: the student uses instead of . This changes and can flip the conclusion, especially when and are far apart.

Error Type 2 — wrong p-value direction: for a two-tailed test, reporting only the one-tail area instead of doubling it. If , the one-tail area is 0.018 but the correct two-tailed — under-reporting risks a spurious rejection.

Error Type 3 — “accept ” language: the numbers are right but the conclusion says “accept ” or “the data prove .” When , the correct statement is “fail to reject ” — the data are not surprising enough under , which is not proof that is true.

Check your own work: verify (1) the SE uses ; (2) the p-value direction matches ; (3) the conclusion says “reject” or “fail to reject,” never “accept.”

Problem 5 — Multi-Step Synthesis: Hospital Quality Audit

Data: n = 20 patients; days, s = 1.2; 7 readmissions (); claimed days and . Test both at α = 0.05.

(a) t-test for mean post-operative stay:

; (two-tailed). σ unknown, n = 20 < 30 → t-test, df = 19.
. .
df = 19: (two-tail 0.05); (two-tail 0.02). Since : .
Decision: → reject . Sufficient evidence the mean stay differs from 3.5 days.

(b) z-test for readmission proportion:

; (two-tailed).
Conditions: , which does not reach the course’s threshold ( ✓). With only ~5 expected successes, the normal approximation is unreliable — the z-test below is shown for completeness, but a larger sample is needed for a valid proportion test.
. . .
Decision: even taken at face value, → fail to reject — but the failed condition is the more important finding.

(c) Interpreting two simultaneous results: rejecting the stay claim shows the stated 3.5-day average is statistically inconsistent with the data — but a 0.6-day difference’s practical importance is a clinical judgment, not a p-value. Failing to reject the readmission claim does not confirm 25%: with n = 20 (and a failed condition), the test has very low power — a true rate of 35–40% could easily go undetected. One test rejecting and the other not is no contradiction: each addresses a separate parameter with its own variability. Both conclusions are about evidence, not proof.

(d) “Accept ” reasoning error: the administrator commits the “fail to reject = accept” fallacy. With n = 20 and very low power, absence of statistical evidence is not positive evidence — “no news” here just means the study was underpowered (and the condition for the test was not even met). Correct: “We do not have sufficient evidence at the 5% level to conclude the readmission proportion differs from 25%. A larger sample is needed before any affirmative conclusion.”

Note on conditions: here , so the success-failure condition is not satisfied. Flag this in the write-up and recommend a larger sample (for , gives ) before relying on the proportion test.

Section 7: Mastery Check Solutions

▾

Problem 1 — Feynman Test: Why (Not ) in the Denominator

The denominator of the z statistic is the standard error of under — the spread we’d expect in sample proportions if the null were exactly right. says the true proportion is , so varies around with standard deviation . That is the SE we use.

Using instead would compute the SE as if the true proportion equaled our sample estimate — circular reasoning: assuming the sample is exactly right in order to judge whether it is surprising. The rule: SE uses , because we assume is true when computing the p-value.

Problem 2 — Apply: Placement Test

; ; n = 120, x = 54, α = 0.05.

Part A: ; (two-tailed). The null uses the claimed proportion (0.40), not the sample estimate.

Part B (conditions): ✓; ✓. Use the z-test for proportions.

Part C (statistic): . . .

Part D (conclusion): → fail to reject . Insufficient evidence at the 5% level that the true passage rate differs from 40%; the observed 45% is consistent with random variation around 40%.

Common mistake: choosing a right-tailed test because . The tail direction comes from the research question (does the rate differ in any direction?), not from the observed sample — using the data to pick the tail is data snooping.

Problem 3 — Error Analysis: df Off by One

n = 10, t = 2.50, two-tailed. Researcher uses df = 10, finds , rejects .

The error: df should be , not 10. df = 10 gives ; the correct df = 9 gives .

Does the conclusion change? df = 10: → reject. df = 9: → still reject. Not here.

When would it matter? If were between 2.228 and 2.262 (e.g., ): the wrong df (10) would reject, the correct df (9) would fail to reject. The off-by-one df error is consequential near the boundary of the critical region — worth correcting even when the conclusion happens not to change. Habit: write before opening the t-table.

Section 8: Boss Fight Solutions

▾

Path A — The Health Researcher

Data: n = 18, days, s = 1.2, , α = 0.05 (two-tailed).

Task 1 — Hypotheses and test choice: ; (two-tailed — a skeptic checks both directions with no prior directional claim). t-test because σ is unknown, n = 18 < 30, population approximately normal; df = 17.

Task 2 — Compute t and bound p: . . df = 17: (two-tail 0.05); (two-tail 0.02). Since : .

Task 3 — Conclusion and Type II error: → reject . Sufficient evidence the mean recovery time differs from 4.5 days (it appears longer). A Type II error here (failing to reject a false ) would mean accepting the 4.5-day claim when the true mean is higher — patients get inaccurate discharge expectations and resources may be under-allocated.

Task 4 — z vs. t argument: the z-vs-t criterion is whether σ is known, not sample size. Here σ is unknown, so t is required regardless of n being near 30. Using instead of understates the critical value (anti-conservative): if were 2.00, z would reject () while t would not (), inflating the true Type I error rate above α.

Path B — The Policy Analyst

Data: n = 150, x = 63, , α = 0.05 (two-tailed).

Task 1 — Hypotheses and conditions: ; (two-tailed — no prior directional claim). Conditions: ✓; ✓; random sample ✓. Use the z-test for proportions.

Task 2 — , z, p-value: . . . .

Task 3 — Decision and CI check: → fail to reject . 95% CI (CIs use in the SE, not ): . . falls inside → fail to reject ✓. Key distinction: the test uses in the SE; the CI uses .

Task 4 — Practical vs. statistical significance: the test failed to reject, so the group cannot claim a statistically detectable difference from 35%. Even a (7 pp) gap may or may not be “dramatic” by policy standards — a domain judgment, not a statistical one. Responsible wording: “Our sample shows 42% — 7 points above the government’s 35% — but we lack sufficient statistical evidence at the 5% level to conclude the true rate differs; a larger sample is needed.”

Section 9: Challenge Problem Solutions

▾

Challenge 1 — Paired Data: t-Test on Differences

The key insight: treat the differences as a single sample, then run the one-sample t-test on the . This removes between-subject variability and increases power.

Variant 0 (typing speed; 7 participants; vs. ):

Participant	Before	After
1	52	58	6
2	45	50	5
3	60	65	5
4	38	42	4
5	55	62	7
6	48	51	3
7	50	55	5

wpm. Deviations: 1, 0, 0, −1, 2, −2, 0; sum of squares . , . . , df = 6. df = 6: (two-tail 0.01); since , → reject . Very strong evidence the training improves typing speed.

Variant 1 (reaction time; 6 athletes; vs. , left-tailed, α = 0.05):

Athlete	Before	After
1	280	265	−15
2	310	295	−15
3	295	290	−5
4	320	305	−15
5	275	260	−15
6	300	285	−15

ms. Deviations from −13.33: −1.67, −1.67, 8.33, −1.67, −1.67, −1.67; sum of squares . , . , df = 5. Left-tailed: (one-tail 0.05, df = 5), → reject . Strong evidence the drill reduces reaction time.

Variant 2 (caloric intake; 8 patients; vs. , α = 0.05):

Patient	Before	After
1	2400	2200	−200
2	2100	2050	−50
3	2800	2500	−300
4	2300	2250	−50
5	2600	2350	−250
6	1900	1950	+50
7	2200	2100	−100
8	2500	2300	−200

kcal. Deviations: −62.5, 87.5, −162.5, 87.5, −112.5, 187.5, 37.5, −62.5; sum of squares . , . , df = 7. df = 7: (two-tail 0.05); (two-tail 0.02). Since : → reject . Sufficient evidence the intervention changes (reduces) mean caloric intake.

Common mistake: running a two-sample test instead of treating the differences as a single sample. Paired designs remove between-subject noise — always compute first, then apply the one-sample t-test to the column.

Challenge 2 — Power of the t-Test

mL, true , n = 16, s = 20, α = 0.05 (two-tailed), df = 15.

(a) Approximate power: standardized shift . Critical value . Using the normal approximation, power — about 13%.

(b) What’s needed for 80% power: 13% power means a true 5-mL shift would be detected only about 1 time in 8 — very low. Using with , : observations. The current n = 16 is far too small to detect a 5-mL shift reliably.

Key lesson: failing to reject is not “no effect.” With 13% power, a failure to reject is almost uninformative — the test was very unlikely to detect the true shift even if it exists. Always consider power alongside the decision.

← Return to Lesson INF-6

INF-6: Solutions — Hypothesis Testing for Small-Sample Mean and Proportion

Section 5: Guided Practice Solutions

Problem 1 — t-Test Decisions (Variants A, B, C)

Problem 2 — Proportion Test Conditions and z Statistic (Variants A, B, C)

Problem 3 — Choosing t vs. z vs. Proportion Test

Problem 4 — CI Equivalence: Same Conclusion, Two Methods

Section 6: Independent Practice Solutions

Problem 1 — t-Test (Generator)

Problem 2 — One-Tailed Proportion Test (Variants 0–2)

Problem 3 — Proportion Test (Generator)

Problem 4 — Find the Error

Problem 5 — Multi-Step Synthesis: Hospital Quality Audit

Section 7: Mastery Check Solutions

Problem 1 — Feynman Test: Why (Not ) in the Denominator

Problem 2 — Apply: Placement Test

Problem 3 — Error Analysis: df Off by One

Section 8: Boss Fight Solutions

Path A — The Health Researcher

Path B — The Policy Analyst

Section 9: Challenge Problem Solutions

Challenge 1 — Paired Data: t-Test on Differences

Challenge 2 — Power of the t-Test