Solutions — Hypothesis Testing for a Population Mean (Large Sample)

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson INF-5

Section 5: Guided Practice Solutions

▾

Problem 1 — Setting Up Hypotheses and Computing z (Variants 0–2)

Variant 0 (bottling plant, mL, , , ): (a) ; (two-tailed) — any deviation from 750 mL matters. (b) mL, .
Variant 1 (delivery company, days, , , ): (a) ; (two-tailed). (b) days, .
Variant 2 (bag fill, g, , , ): (a) ; (two-tailed). (b) g, .

Common mistakes: (1) writing hypotheses in terms of instead of the parameter ; (2) using directly as the denominator of z instead of ; (3) reversing and ( always uses ”=”).

Problem 2 — p-value and Decision (Variants 0–2)

Variant 0 (, two-tailed, ): (a) , one-tail , two-tailed . (b) → reject . Evidence the mean fill is not 750 mL.
Variant 1 (, two-tailed, ): (a) , one-tail , . (b) → reject . Evidence the mean delivery time differs from 3.0 days.
Variant 2 (, two-tailed, ): (a) . (b) → fail to reject . Same z as Variant 1, but a stricter changes the decision.

Common mistakes: (1) forgetting to multiply the one-tail area by 2 for a two-tailed test; (2) writing “accept ” — correct is “fail to reject ”; (3) comparing to instead of to ; (4) assuming a negative automatically means rejection.

Problem 3 — Choosing the Test Form (Traffic Speed)

Correct answer: right-tailed test. ; km/h.

Justification: the planner’s concern is specifically whether speeds have increased beyond 110 km/h — a directional question, chosen from the research question before examining the data. A two-tailed test wastes power on the irrelevant low direction; a left-tailed test looks the wrong way.

Common mistake: choosing “two-tailed because the sample could go either way.” The test type comes from the research question, not the data. Choosing direction after seeing the data is data snooping — it inflates the true Type I error rate.

Problem 4 — Classifying Errors (Drug Trial)

Correct answer: Type II error — the company failed to reject a false null hypothesis.

Justification: the true mean recovery time is 11 days, so is false, and the test failed to detect this. Failing to reject a false null is by definition a Type II error () — not a Type I error (which requires rejecting a true null). And the company “failed to reject,” never “accepted,” .

Common mistakes: (1) confusing the errors — Type I = false alarm (reject true ); Type II = missed signal (fail to reject false ); (2) saying “no error occurred because they followed the rule” — the rule can still err; (3) writing “accepted .”

Section 6: Independent Practice Solutions

▾

Problem 1 — Two-Tailed Test (Generator)

The five-step approach is always the same. Example (, , , , ):

Hypotheses: ; (two-tailed).
Conditions: ✓; known ✓. CLT applies.
Test statistic: . .
p-value: .
Conclusion: → reject . Evidence the mean differs from 120. (Generated values differ; the logic is identical.)

Problem 2 — One-Tailed Test Direction (Variants 0–2)

Variant 0 (battery life, h, , , , ): (a) ; (left-tailed). (b) , . → reject . Evidence mean life is below 500 h.
Variant 1 (factory emissions, ppm, , , , ): (a) ; (right-tailed). (b) , . → reject . Evidence emissions exceed 80 ppm.
Variant 2 (meal plan, kcal, , , , ): (a) ; (left-tailed). (b) , . → reject . Evidence intake is below 2000 kcal.

Common mistakes: (1) using for a left-tailed test with negative — for a left tail, ; (2) choosing the direction after seeing the data; (3) writing instead of the point equality .

Problem 3 — One-Tailed Test (Generator)

Example (, , , , left-tailed, ):

Hypotheses: ; .
Conditions: ✓; known ✓.
Test statistic: . .
p-value: .
Conclusion: → reject . (Generated values differ.)

Problem 4 — Error Classification and the α–β Trade-off (Variants 0–2)

Variant 0 (sodium content, fail to reject , true mean = 430 mg): (a) Type II error — is false and was not rejected. (b) Increase the sample size . Lowering from 0.05 to 0.01 would increase ; increasing is the only way to reduce both error types at once.
Variant 1 (prosecutor, reject , defendant was compliant): (a) Type I error — a true was rejected, with probability . (b) The 1% rate is a built-in feature of the rule, not a flaw; using reduces (not eliminates) Type I errors while increasing Type II.
Variant 2 (quality engineer, reject , true mean had not shifted): (a) Type I error with probability . (b) Reducing to 0.01 lowers Type I but raises for the same — power drops; only larger reduces both.

Common mistakes: (1) confusing which error occurred — check both the test outcome and the true state of ; (2) thinking lower fixes all errors (it only moves the trade-off); (3) claiming higher reduces Type II errors without noting the higher Type I cost.

Problem 5 — Multi-Step Synthesis: News Headline Interpretation

(a) The journalist’s error: “no evidence of an effect” is not “evidence of no effect.” Failing to reject means only that the data are consistent with , not that it is true — a Type II error may have occurred. Correct: “there is insufficient evidence at the chosen significance level to conclude that the method improves test scores.”

(b) Concern with n = 15: the SE is large, so only a very large effect would be detected — the test has low power (high ). A modest real effect would almost certainly be missed. Also, with and unknown , a z-test is inappropriate; a t-test (inf-6) should be used.

(c) Multiple comparisons / p-hacking: running 20 independent tests each at gives a familywise Type I rate of about . Reporting only the significant results hides that roughly 1 in 20 could be a pure false positive — selective reporting that severely undermines validity.

Common mistakes: (1) treating “fail to reject ” as proof the method does not work; (2) overlooking power when n is small; (3) not recognizing that many tests inflate the overall false-positive rate.

Section 7: Mastery Check Solutions

▾

Problem 1 — Feynman Test: What Is a p-value?

A p-value is the probability of observing a test statistic at least as extreme as the one computed, assuming the null hypothesis is true. A small p-value means the data would be very surprising in a null world — strong evidence against ; a large p-value means the data are consistent with .

What a p-value does NOT tell you:

It is not the probability that is true.
It is not the probability that the result occurred by chance.
It is not a measure of effect size or practical importance.
A small p-value does not mean the effect is large or clinically meaningful.
A large p-value () does not prove is true — only that evidence was insufficient to reject it.

Problem 2 — Apply: Lake pH Test (, , )

Part A — Correct alternative: (two-tailed). The authority wants to know whether pH differs from 7.0 in either direction; choosing “less than” after seeing would be data snooping.

Part B — Five-step solution:

; (two-tailed).
Conditions: , but is known and the population is approximately normal, so z is valid. (With unknown and small n, inf-6 requires a t-test.)
. .
.
→ fail to reject . Insufficient evidence that the lake’s mean pH differs from 7.0.

Common mistake: seeing and switching to a left-tailed test. The direction of must be set before examining the data.

Problem 3 — Error Analysis: “Proves the Null Is False”

Error 1 — “proves”: a test never proves anything with certainty. Rejecting means only that the data are statistically unlikely under ; there is still probability of a Type I error. Correct: “there is sufficient evidence at the 5% level to reject ” — not “proves.”

Error 2 — “definitely works”: statistical significance () says nothing about effect size or practical importance. With a very large sample, even a trivial improvement could give . Report effect size alongside the p-value.

Corrected statement: “There is sufficient evidence at the 5% significance level to reject and conclude the technique has a statistically detectable effect on exam scores. However, significance does not imply the effect is large or practically important — the magnitude should also be reported.”

Section 8: Boss Fight Solutions

▾

Path A — The Auditor: Grant Processing Times

( days, , , , )

Task 1 — Hypotheses: ; (right-tailed). The mandate is to detect whether times exceed the target, so a one-tailed test concentrates all power in the relevant direction.

Task 2 — Conditions and statistic: ✓; known ✓. . .

Task 3 — p-value and conclusion: . Since , reject . Audit sentence: “Based on 64 applications, there is sufficient evidence at the 1% level that the mean processing time exceeds the 30-day target (, ).”

Task 4 — α reduction: lowering to 0.001 raises the critical value to ; with the test would fail to reject — a Type II error, missing the real delay. Given that undetected delays waste public money and harm applicants, lowering to 0.001 is not advisable; is already demanding for public audit use.

Common mistakes: (1) choosing a two-tailed test for an audit with a clear directional mandate; (2) thinking lower is always “better”; (3) writing “we accept .”

Path B — The Designer: Manufacturing Diameter Check

( mm, , , ; Type I cost = €500, Type II cost = €5,000)

Task 1 — Five-step test at :

; (two-tailed — too small and too large are both defects).
Conditions: ✓; known ✓.
. .
.
→ fail to reject . Insufficient evidence the mean diameter differs from 25.00 mm.

Task 2 — Error type (true mean = 25.015 mm): is false, and the test failed to reject it → Type II error (). The procedure was correct; it simply lacked power to detect a 0.015 mm shift (1.2 SE above the null).

Task 3 — α direction and cost trade-off: to reduce , increase (a lower rejection threshold detects true shifts more easily). The Type II cost (€5,000) is 10× the Type I cost (€500), so a higher (e.g., 0.10) is rational — accept more cheap false alarms to avoid the costly missed defects.

Task 4 — Sample size for 80% power to detect a 0.01 mm shift: , , two-tailed so ; 80% power so . Round up: parts. The current has very low power — about 8× as many measurements are needed.

Common mistakes: (1) using in a one-tailed context — for two-tailed use ; (2) rounding the sample size down (always round up); (3) assuming higher is always bad — in an asymmetric cost context it can be rational.

Section 9: Challenge Problem Solutions

▾

Challenge 1 — Critical-Value Approach (Variants 0–2)

Variant 0 (postal service, days, , , , two-tailed): critical . , . → fail to reject. Check: ✓.
Variant 1 (widget weight, g, , , , two-tailed): critical . , . → fail to reject. Check: ✓.
Variant 2 (tablet weight, mg, , , , two-tailed): critical . , . → reject . Check: ✓.

Key insight: the critical-value and p-value approaches always yield the same decision — they are mathematically equivalent. The p-value tells you how extreme the result is relative to ; the critical-value form gives a direct comparison in z-units.

Challenge 2 — Equivalence of CI and Two-Tailed Test

( g, , , , .)

(a) 95% CI: , . g.

(b) Equivalence: falls outside . A 95% CI and a two-tailed test at always agree:

outside the 95% CI → reject at ✓
inside the 95% CI → fail to reject ✓

This holds exactly for two-tailed tests; one-tailed tests need a one-sided confidence bound instead.

Challenge 3 — Two-Sample Preview (Generator)

The single-sample five-step template is shown in Problem 1 above. A two-sample z-test for two independent means uses The same framework applies; only the standard-error formula changes. This is covered formally in reg-3.

Key takeaway: the five-step framework is universal — state hypotheses, check conditions, compute a test statistic, find the p-value, conclude. Only the test-statistic formula and reference distribution change across one-sample, two-sample, proportion, and regression tests.

← Return to Lesson INF-5

INF-5: Solutions — Hypothesis Testing for a Population Mean (Large Sample)

Section 5: Guided Practice Solutions

Problem 1 — Setting Up Hypotheses and Computing z (Variants 0–2)

Problem 2 — p-value and Decision (Variants 0–2)

Problem 3 — Choosing the Test Form (Traffic Speed)

Problem 4 — Classifying Errors (Drug Trial)

Section 6: Independent Practice Solutions

Problem 1 — Two-Tailed Test (Generator)

Problem 2 — One-Tailed Test Direction (Variants 0–2)

Problem 3 — One-Tailed Test (Generator)

Problem 4 — Error Classification and the α–β Trade-off (Variants 0–2)

Problem 5 — Multi-Step Synthesis: News Headline Interpretation

Section 7: Mastery Check Solutions

Problem 1 — Feynman Test: What Is a p-value?

Problem 2 — Apply: Lake pH Test (, , )

Problem 3 — Error Analysis: “Proves the Null Is False”

Section 8: Boss Fight Solutions

Path A — The Auditor: Grant Processing Times

Path B — The Designer: Manufacturing Diameter Check

Section 9: Challenge Problem Solutions

Challenge 1 — Critical-Value Approach (Variants 0–2)

Challenge 2 — Equivalence of CI and Two-Tailed Test

Challenge 3 — Two-Sample Preview (Generator)