EN FR

INF-6: Hypothesis Testing for Small Sample Mean and Proportion

Module 4 · Statistical Inference

Section 1: Introduction

A factory claims its machine fills bottles to a mean of 500 mL. A quality technician is suspicious — but she can only collect 12 bottles for testing (the test destroys each bottle). With only and no known population standard deviation , the z-test from INF-5 is off the table. She needs the right tool for small-sample testing: the t-test.

Meanwhile, a consumer watchdog group surveyed 80 households and found 18 (22.5%) reporting food insecurity. A government agency claims the true rate is only 15%. Is the group’s sample proportion surprising enough to challenge the agency’s claim? That requires the proportion test.

Both questions use the same five steps you mastered in INF-5. The only thing that changes is the formula for the test statistic — and the distribution you read it against. Everything else carries over exactly.

After this lesson, you will be able to:

By the end of this lesson, you will be able to:

  • Identify when to use the t-test vs. the z-test for a population mean.
  • Compute the t test statistic and bound the p-value using the t-table.
  • State and check conditions for the proportion test: and .
  • Compute the z test statistic and find the p-value.
  • Explain why the CI and p-value approaches give the same reject/fail-to-reject decision.

Section 2: Prerequisites

What you need coming in — and why it matters today:

  • Five-step hypothesis test framework (INF-5): State H₀ and Hₐ, check conditions, compute the test statistic, find the p-value, state the conclusion. Every problem in this lesson uses this structure verbatim.
  • p-value and decision rule (INF-5): The p-value is . Reject if ; fail to reject if . These rules are identical in INF-6.
  • t-distribution and degrees of freedom (INF-3): When is unknown, we use the t-distribution with . The t-table gives critical values, not exact p-values — we bound the p-value between two levels.
  • Sample proportion and SE formula (INF-4): The point estimate is (sample successes divided by sample size). The SE for a proportion uses the null value , not .
  • ”Fail to reject” language (INF-5): Never “accept .” The correct forms are “reject ” (when ) and “fail to reject ” (when ).

Quick check — can you recall these?

What are the degrees of freedom for a one-sample -test?

Success Factor:

What changes in this lesson: In INF-5, you always knew — so you computed a z test statistic and read an exact p-value from the z-table. In this lesson, is unknown. We use instead, which forces us to use the t-distribution. The t-table gives bounds (e.g., 0.02 < p < 0.05) rather than exact p-values — and that is sufficient to make the reject/fail-to-reject decision. For proportion tests, the z-distribution still applies (because the proportion test statistic is approximately normal when conditions are met), but we use — not — in the denominator.

Retrieval Warm-up — from earlier lessons

A researcher constructs a one-sample t-interval from observations with unknown . She looks up the critical value from the t-table. Which row should she use?

A z-test yields a p-value of 0.03. The researcher set before seeing the data. She says: “I reject — and since p = 0.03 < 0.05, the probability that is true is only 3%.” Identify the error.

Section 3: Core Concepts

How this section is organized: Eight concepts cover both test types in order. C1–C3 handle the t-test for a mean. C4–C5 introduce the proportion test. C6 reviews one-vs.-two-tailed choice (same rules as INF-5). C7–C8 connect both tests to broader ideas you’ll use in REG-3 and beyond.

  • C1–C3: The t-test — when to use it, the formula, how to bound p-values from the t-table
  • C4–C5: The proportion test — conditions, formula, and the p₀ vs. p̂ distinction
  • C6: Choosing the tail — identical rules as INF-5
  • C7–C8: Bridging ideas — CI equivalence and practical vs. statistical significance

C1 — When to Use the t-Test

In INF-5, you used the z-test: . That formula requires knowing . When is unknown, we estimate it with and switch to the t-distribution. The t-distribution is wider than the normal — it accounts for the extra uncertainty introduced by estimating from a small sample.

When to Use the t-Test for a Mean

Use the one-sample t-test when:

  1. is unknown (you have , the sample standard deviation, instead)
  2. The population is approximately normal (required when ; less critical for larger n)
  3. The sample is random (independence of observations)

When is known, use the z-test from INF-5 regardless of sample size.

Decision rule in plain English: If the problem gives you → z-test. If the problem gives you and is unknown → t-test. Sample size ( vs. ) does not override this — what matters is whether is known.

A very common mistake: using the z-test when only is given, reasoning “n is large enough.” The distinction is not about sample size — it is about whether is known. If you see in the problem and no mention of a known , use the t-distribution. With large , the t-distribution converges to the standard normal, so the numerical difference becomes small — but the principle remains: known, not → t.


C2 — The t Test Statistic

The formula is almost identical to the z-test statistic. Replace with , and replace with .

t Test Statistic for a Population Mean

where is the null hypothesis value of the mean, is the sample standard deviation, and is the sample size.

A large means the data are far from what predicts — strong evidence against . The degrees of freedom determine which t-distribution to use when reading the t-table.

Mini-example: Suppose , , , .

. We look along the df = 8 row of the t-table to bound the p-value.

The most common arithmetic error: using instead of . For , — not 9. Using the wrong df row gives the wrong critical value and can change your decision. Always subtract 1.


C3 — Reading p-Values from the t-Table

The z-table gives exact p-values. The t-table works differently: it gives critical values for standard levels. To find the p-value for a computed t, you locate your df row and find which two critical values your falls between. This gives you a p-value bound: “p is between 0.02 and 0.05.”

Bounding the p-Value from the t-Table

For a computed with known :

  1. Find the row in the t-table.
  2. Locate the two critical values that bracket : the one just below and the one just above.
  3. Read the corresponding (one-tail or two-tail, depending on your ) for each critical value.
  4. The p-value lies between those two values.

Example: , , two-tailed test. From the t-table, df = 11 row: (two-tail ) and (two-tail ). Since , we have .

Student's t-Distribution Table

Critical values (t*) for given degrees of freedom (df) and tail area.

df Confidence
80%90%95%98%99%99.9%
0.10 (1) 0.20 (2)
0.05 (1) 0.10 (2)
0.025 (1) 0.05 (2)
0.01 (1) 0.02 (2)
0.005 (1) 0.01 (2)
0.0005 (1) 0.001 (2)
1 3.0786.31412.70631.82163.657636.619
2 1.8862.9204.3036.9659.92531.599
3 1.6382.3533.1824.5415.84112.924
4 1.5332.1322.7763.7474.6048.610
5 1.4762.0152.5713.3654.0326.869
6 1.4401.9432.4473.1433.7075.959
7 1.4151.8952.3652.9983.4995.408
8 1.3971.8602.3062.8963.3555.041
9 1.3831.8332.2622.8213.2504.781
10 1.3721.8122.2282.7643.1694.587
11 1.3631.7962.2012.7183.1064.437
12 1.3561.7822.1792.6813.0554.318
13 1.3501.7712.1602.6503.0124.221
14 1.3451.7612.1452.6242.9774.140
15 1.3411.7532.1312.6022.9474.073
16 1.3371.7462.1202.5832.9214.015
17 1.3331.7402.1102.5672.8983.965
18 1.3301.7342.1012.5522.8783.922
19 1.3281.7292.0932.5392.8613.883
20 1.3251.7252.0862.5282.8453.850
21 1.3231.7212.0802.5182.8313.819
22 1.3211.7172.0742.5082.8193.792
23 1.3191.7142.0692.5002.8073.768
24 1.3181.7112.0642.4922.7973.745
25 1.3161.7082.0602.4852.7873.725
26 1.3151.7062.0562.4792.7793.707
27 1.3141.7032.0522.4732.7713.690
28 1.3131.7012.0482.4672.7633.674
29 1.3111.6992.0452.4622.7563.659
30 1.3101.6972.0422.4572.7503.646
40 1.3031.6842.0212.4232.7043.551
50 1.2991.6762.0092.4032.6783.496
60 1.2961.6712.0002.3902.6603.460
80 1.2921.6641.9902.3742.6393.416
100 1.2901.6601.9842.3642.6263.390
1.2821.6461.9622.3302.5813.300

Why bounds are enough: If and , then — we reject . If and , then — we fail to reject . The bound tells us which side of the p-value falls on. That is all we need for the decision.

For a two-tailed test, you must use the two-tail column of the t-table (or equivalently, double the one-tail area). If your uses ≠, the p-value is . Reading the one-tail column without multiplying by 2 will give you a p-value that is half the correct size — and can lead you to reject when you should not.


C4 — Conditions for the Proportion Test

When the research question is about a population proportion , we use a z-test (not t) — even for relatively small samples. This is possible because the sampling distribution of is approximately normal when the sample is large enough relative to .

Conditions for the One-Sample Proportion Test

Before conducting a proportion test with , verify:

  1. — at least 5 expected successes under
  2. — at least 5 expected failures under
  3. Random sample — observations are independent

Note: Use (the null value), not (the sample estimate), when checking conditions.

Check conditions using , not . The conditions assess whether the null hypothesis model produces enough expected counts — which requires . If you use instead, you may incorrectly declare conditions met (or unmet) when the opposite is true.


C5 — The Proportion Test Statistic

The formula has the same shape as the z test statistic from INF-5, but with in the numerator and in the denominator.

z Test Statistic for a Population Proportion

where is the sample proportion, is the null hypothesis value, and is the sample size.

The p-value for a two-tailed test is , read from the standard normal table.

Why in the denominator: We are testing whether is true. To compute how surprising our data are under , we assume is true and ask: “What is the standard error of if the true proportion really were ?” The SE under is — using because that is what the true proportion would be if were true. Using instead would mean assuming our sample estimate is exactly right — circular reasoning.

This is the most common error in proportion tests: plugging into the denominator instead of . The denominator represents the SE under the null hypothesis. The null hypothesis says , so use . Remember: the numerator measures the gap between what we observed () and what claims (); the denominator measures the variability we’d expect if were true.


C6 — One-Tailed vs. Two-Tailed Tests

The rules for choosing the tail are identical to INF-5. Set from the research question before collecting data.

Choosing the Tail

Two-tailed ( or ): Use when the research question asks whether the parameter differs from (or ) in any direction. p-value = or .

Left-tailed ( or ): Use when the claim is specifically that the parameter is below the null value. p-value = or .

Right-tailed ( or ): Use when the claim is specifically that the parameter is above the null value. p-value = or .

Choose the tail from the research question — not from the data. Selecting a one-tailed test after seeing the direction of the sample mean or sample proportion is data snooping. It inflates the true Type I error rate to roughly . If the question does not give a directional prior claim, use a two-tailed test.


C7 — Equivalence of CI and p-Value Approaches

The confidence interval approach and the p-value approach always agree on the reject/fail-to-reject decision for two-tailed tests.

Side-by-side comparison (two-tailed test at ):

p-value approach: Compute . If , reject .

CI approach: Build a 95% CI for . If falls outside the interval, reject .

Both approaches will give the same decision. If falls outside the 95% CI, the data are more than 1.96 (or ) standard errors from — which is exactly the condition that produces . The two methods are mathematically equivalent for two-tailed tests.

The CI and p-value approaches give the same decision for two-tailed tests, but they express different things. The CI estimates a plausible range for the true parameter. The p-value quantifies how surprising the data are under . Do not confuse the two purposes: a CI does not give a p-value, and a p-value does not give a CI.


C8 — Practical vs. Statistical Significance

A statistically significant result (p < α) tells you the effect is unlikely to be due to chance. It does not tell you the effect is large or meaningful.

With a very large sample, even a tiny difference from can produce . For example, a factory producing bottles at a mean of 500.1 mL instead of 500 mL might have with — but 0.1 mL is practically irrelevant. Always report the effect size (e.g., the actual difference or ) alongside the p-value, and ask: “Does this difference matter in the real world?” Statistical significance alone does not answer that question.

Connecting to downstream lessons: REG-3 tests whether the correlation coefficient is significantly different from zero using a t-test with . REG-4 uses the chi-square test with the same five-step framework. The language and structure you build here carry forward exactly.

Section 4: Worked Examples

Example 1 — Fully Worked: Caloric Intake (Two-Tailed t-Test)

A nutritionist claims adults in a certain region consume a mean of kcal/day. A random sample of adults yields kcal and kcal. Test at .

Student's t-Distribution Table

Critical values (t*) for given degrees of freedom (df) and tail area.

df Confidence
80%90%95%98%99%99.9%
0.10 (1) 0.20 (2)
0.05 (1) 0.10 (2)
0.025 (1) 0.05 (2)
0.01 (1) 0.02 (2)
0.005 (1) 0.01 (2)
0.0005 (1) 0.001 (2)
1 3.0786.31412.70631.82163.657636.619
2 1.8862.9204.3036.9659.92531.599
3 1.6382.3533.1824.5415.84112.924
4 1.5332.1322.7763.7474.6048.610
5 1.4762.0152.5713.3654.0326.869
6 1.4401.9432.4473.1433.7075.959
7 1.4151.8952.3652.9983.4995.408
8 1.3971.8602.3062.8963.3555.041
9 1.3831.8332.2622.8213.2504.781
10 1.3721.8122.2282.7643.1694.587
11 1.3631.7962.2012.7183.1064.437
12 1.3561.7822.1792.6813.0554.318
13 1.3501.7712.1602.6503.0124.221
14 1.3451.7612.1452.6242.9774.140
15 1.3411.7532.1312.6022.9474.073
16 1.3371.7462.1202.5832.9214.015
17 1.3331.7402.1102.5672.8983.965
18 1.3301.7342.1012.5522.8783.922
19 1.3281.7292.0932.5392.8613.883
20 1.3251.7252.0862.5282.8453.850
21 1.3231.7212.0802.5182.8313.819
22 1.3211.7172.0742.5082.8193.792
23 1.3191.7142.0692.5002.8073.768
24 1.3181.7112.0642.4922.7973.745
25 1.3161.7082.0602.4852.7873.725
26 1.3151.7062.0562.4792.7793.707
27 1.3141.7032.0522.4732.7713.690
28 1.3131.7012.0482.4672.7633.674
29 1.3111.6992.0452.4622.7563.659
30 1.3101.6972.0422.4572.7503.646
40 1.3031.6842.0212.4232.7043.551
50 1.2991.6762.0092.4032.6783.496
60 1.2961.6712.0002.3902.6603.460
80 1.2921.6641.9902.3742.6393.416
100 1.2901.6601.9842.3642.6263.390
1.2821.6461.9622.3302.5813.300

Step 1: State and

kcal    kcal (two-tailed — the researcher is testing whether the mean differs in any direction from the claim)

Step 2: Check Conditions

and is unknown → I must use the t-distribution. Caloric intake is approximately normally distributed for the target population ✓. Random sample ✓. Conditions met: use t with .

Step 3: Compute the Test Statistic

I notice kcal.

I choose to work with the positive value because I’ll use for the two-tailed p-value.

Step 4: Find the p-value (bound from t-table)

. Looking along the df = 15 row of the t-table:

  • corresponds to two-tail
  • corresponds to two-tail

Since , the two-tailed p-value is between 0.05 and 0.10: .

Step 5: Conclusion

. We fail to reject .

There is insufficient evidence at the 5% significance level to conclude that the mean caloric intake in this region differs from 2,000 kcal/day.


Example 2 — Prediction Checkpoint: Tire Durability (One-Tailed t-Test)

A tire manufacturer claims its tires last km. A sample of tires gives km and km. Test at .

Steps 1–3:

km; km (left-tailed — we’re testing whether tires fail to meet the lower bound). Conditions: , unknown, approximately normal population assumed ✓. Use t with .

km.

Pause here. Before reading the p-value and decision:

  • with . Looking at the df = 8 row of the t-table: (one-tail 0.10) and (one-tail 0.05) and (one-tail 0.025). Where does fall?
  • Do you expect to reject at ? Write down your prediction before continuing.
Show Solution (Steps 4–5)

Step 4: Left-tailed test. , . From the t-table, df = 8 row:

  • → one-tail
  • → one-tail

Since : .

Step 5: . We fail to reject .

There is insufficient evidence at the 1% significance level to conclude that the mean tire life is less than 50,000 km. (Note: we would reject at , but not at the stricter .)


Example 3 — Details/Summary: Recycling Rate (Two-Tailed Proportion Test)

A city claims 30% of households recycle regularly. A consumer survey of households finds 54 that recycle (). Test at .

Show Full Solution

Step 1: ; (two-tailed).

Step 2: Check Conditions

Random sample ✓. Conditions met — use the z test for proportions.

Step 3: Compute and the Test Statistic

Step 4: Two-tailed: .

(from z-table, rounding to 0.93).

.

Step 5: . We fail to reject .

There is insufficient evidence at the 5% level to conclude that the proportion of households that recycle regularly differs from 30%.


Example 4 — Find the Error: Proportion Test with Two Mistakes

A student tests with , (), . Here is the student’s work.

Student’s analysis:

;

: “I accept . The true rate is 0.40.”

Show Full Error Analysis

Error 1 — Wrong SE: The student used in the denominator instead of . When testing , the SE under uses .

Correct SE:

Correct z:

. The decision is the same (fail to reject), but the formula is wrong and can produce different conclusions in other problems.

Error 2 — “Accept H₀”: We never accept . The correct statement is: “We fail to reject . There is insufficient evidence at the 5% level to conclude the true proportion differs from 0.40.” Failing to reject does not prove is true — it only means the data are not surprising enough under .

Section 5: Guided Practice

Problem 1 — t-Test Decisions (Three Scenarios)

A sleep researcher wants to know if college students sleep less than the recommended 8 hours. She samples students and finds hours and hours. Population is unknown.

Part A: Which test should she use?

Part B: State the correct and .

Part C: Compute and .

A pharmacist claims a generic drug has the same mean effect time as the brand-name version ( min). A random sample of patients using the generic drug shows min and min.

Part A: Which test applies?

Part B: What are and for a two-tailed test?

Part C: Compute and .

A water quality inspector claims that mean lead content in a city’s water supply is ppb. She samples locations and finds ppb and ppb. She wants to test whether the true mean exceeds the claim.

Part A: Which test applies?

Part B: State and .

Part C: What is ?


Problem 2 — Proportion Test Conditions and z Statistic

A public health report claims 20% of adults smoke (). A researcher surveys adults and finds 36 smokers. Test at (two-tailed).

Part A: Are conditions met?

Part B: Compute and the test statistic .

Part C: Is p < 0.05? What is the conclusion?

A university claims that 60% of its graduates find jobs in their field within 6 months (). A journalism investigation surveys graduates and finds 42 employed in their field (). Test at (two-tailed).

Part A: Check conditions.

Part B: Compute the test statistic.

Part C: Conclusion at .

A city claims 35% of commuters use public transit (). A transit authority samples commuters and finds 82 using public transit. Test at (two-tailed).

Part A: Check conditions.

Part B: Compute the test statistic.

Part C: Conclusion at .


Problem 3 — Choosing t vs. z vs. Proportion Test

For each scenario, identify the correct test and state and .

Scenario A: A factory manager claims the mean weight of a product is g. A QC engineer samples 10 units and finds g and g. She wants to know if the mean is below the claim.

Scenario B: A government report states that 40% of households own a pet (). A researcher samples 120 households and finds 54 own a pet. She wants to test whether the true proportion differs from 40%.


Problem 4 — CI Equivalence: Same Conclusion, Two Methods

A nutritionist tests kcal at (two-tailed) with , , . She fails to reject (from Example 1 above: ).

Now construct a 95% t-interval for .

From the t-table: for , 95% confidence.

kcal. kcal.

Part A: What is the 95% CI?

Part B: Does kcal fall inside or outside the CI? What does this tell you?

Section 6: Independent Practice

Problem 1 — t-Test Generator


Problem 2 — One-Tailed Proportion Test

A pharmaceutical company claims that its new medication reduces blood pressure in more than 50% of patients (). In a trial of patients, 58 showed reduced blood pressure. Test at .

Conduct the full five-step test and show your solution below.

Show Solution

Step 1: ; (right-tailed).

Step 2: ✓; ✓; random sample ✓.

Step 3: .

Step 4: Right-tailed: .

Step 5: . We fail to reject .

There is insufficient evidence at the 5% level to conclude that the medication reduces blood pressure in more than 50% of patients.

A consumer advocacy group claims that fewer than 30% of customers are satisfied with a company’s service (). They survey customers and find 30 satisfied. Test at .

Show Solution

Step 1: ; (left-tailed).

Step 2: ✓; ✓; random sample ✓.

Step 3: .

Step 4: Left-tailed: .

Step 5: . We fail to reject .

There is insufficient evidence at the 5% level to conclude that fewer than 30% of customers are satisfied.

A researcher hypothesizes that more than 70% of teenagers use social media daily (). A survey of teenagers finds 117 using social media daily. Test at .

Show Solution

Step 1: ; (right-tailed).

Step 2: ✓; ✓; random sample ✓.

Step 3: .

Step 4: Right-tailed: .

Step 5: . We fail to reject at .

There is insufficient evidence at the 1% level to conclude that more than 70% of teenagers use social media daily. (Note: we would reject at .)


Problem 3 — Proportion Test Generator


Problem 4 — Find the Error


Problem 5 — Multi-Step Synthesis: Hospital Quality Audit

A regional hospital claims two things: (1) the mean post-operative stay is days; (2) exactly 25% of patients are readmitted within 30 days (). An independent auditor collects the following data from a random sample of 20 recent patients:

Test both claims at .

(a) Test the mean claim. State and , check conditions, compute , bound the p-value, and state the conclusion.

Show Solution (a)

Step 1: ; (two-tailed — the auditor is checking whether the mean differs in any direction).

Step 2: , unknown → t-test. Assume approximately normal post-operative stays ✓. .

Step 3: days.

Step 4: , two-tailed. From the t-table, df = 19 row:

  • → two-tail
  • → two-tail

Since : .

Step 5: . We reject .

There is sufficient evidence at the 5% level to conclude that the mean post-operative stay differs from 3.5 days.

(b) Test the readmission proportion claim. State and , check conditions, compute , find the p-value, and state the conclusion.

Show Solution (b)

Step 1: ; (two-tailed).

Step 2: ✓ (barely); ✓. Random sample ✓. Conditions met (note: with exactly meeting the threshold, a larger sample would be preferred).

Step 3: .

Step 4: .

Step 5: . We fail to reject .

There is insufficient evidence at the 5% level to conclude that the readmission proportion differs from 25%. (The sample proportion of 35% is higher than the claimed 25%, but the sample size is small and the variation is consistent with chance.)

(c) The auditor reports: “The hospital’s stay claim is rejected, but the readmission claim is not.” Explain what these two conclusions do and do not prove about the hospital.

Show Solution (c)

What the results show: There is statistical evidence that the mean post-operative stay exceeds 3.5 days. The t-test rejected at , meaning the data would be unlikely if the true mean were 3.5 days.

What the results do not show: Rejecting the stay claim does not prove the hospital is negligent — it only shows a statistically detectable difference from 3.5 days. The actual effect is days; whether 0.6 days more is practically significant depends on clinical context.

Failing to reject the readmission claim does not prove the 25% rate is correct. With , the test has low power — if the true readmission rate were 35%, this sample size would often fail to detect it. The auditor should note that a larger sample is needed to draw reliable conclusions about the readmission rate.

Both conclusions are about evidence, not proof. And one test rejecting while another does not is not a contradiction — each test addresses a separate claim with its own uncertainty.

(d) A hospital administrator argues: “Since the proportion test failed to reject, we should accept that our 25% rate is accurate.” Identify the error in this reasoning.

Show Solution (d)

“Fail to reject ” does not mean “accept ” or “the null hypothesis is proven true.” It means only that the data are not surprising enough under to reject it at the chosen significance level.

With , the test has very limited power to detect a difference from 0.25 — even a true rate of 0.35 or 0.40 might go undetected at . The administrator’s reasoning confuses insufficient evidence with affirmative proof.

The correct statement: “There is insufficient evidence at the 5% level to conclude the readmission proportion differs from 25%. A larger sample is recommended to draw more definitive conclusions.”


Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — t-Confidence Interval (INF-3)

A sports scientist measures the resting heart rate of elite cyclists: bpm, bpm. Population SD unknown.

(a) Construct a 95% CI using the t-distribution. Use . (b) A colleague says: “Since the interval doesn’t include 50 bpm, we can conclude elite cyclists have a lower resting heart rate than the general population (which averages 70 bpm).” Is this reasoning sound?

Show Solution

(a)

(b) The reasoning is directionally sound. The CI (41.1, 47.3) lies entirely below 50 bpm — the data are inconsistent with bpm at 95% confidence. The CI also lies far below the general population mean of 70 bpm, providing strong evidence that elite cyclists have substantially lower resting heart rates. The conclusion is well-supported; the colleague should use frequentist language: “We are 95% confident the true mean resting heart rate of elite cyclists is between 41.1 and 47.3 bpm, well below the 70 bpm general population mean.”


Review Problem 2 — z Hypothesis Test, Full Five Steps (INF-5)

A manufacturer claims its energy bars contain a mean of kcal. A consumer group tests bars randomly and finds kcal with kcal known from extensive prior testing. Test the claim at (two-tailed). Use .

Show Solution

Step 1 — Hypotheses:

kcal; kcal (two-tailed — checking whether the label is wrong in either direction).

Step 2 — Conditions:

Random sample ✓; ✓; known ✓. z-test applies.

Step 3 — Test statistic:

Step 4 — p-value:

Two-tailed: .

Step 5 — Decision and conclusion:

. Reject .

There is strong statistical evidence that the true mean caloric content differs from the labelled 240 kcal. The observed mean of 251 kcal is about 3.16 standard errors above the claimed value — extremely unlikely if the label were accurate.

Section 7: Mastery Check

Question 1 — Feynman Test

In your own words: why do we use (not ) in the denominator of the proportion test statistic? Write as if explaining to a classmate who just made this error on a test. Aim for 200–500 characters.

0 / 500
Model Answer

The denominator of the test statistic is the standard error of under — that is, the standard deviation we’d expect for the sample proportion if the null hypothesis were exactly right.

says the true proportion is . If is true, then varies with standard deviation . This is the SE we should use to measure how surprising is under .

If instead we used in the denominator, we’d be computing the SE as if the true proportion equaled our sample estimate — which is circular reasoning. We’d be assuming our sample is exactly right in order to determine whether our sample is surprising. That defeats the purpose of the test.

The rule: denominator uses because we assume is true to calculate the p-value.


Question 2 — Apply: Placement Test

A school claims that 40% of students pass the advanced mathematics placement test (). A sample of 120 students shows 54 passed. Test at (two-tailed).

Part A: Which are the correct and ?

Part B: Are conditions met?

Part C: What is the test statistic?

Part D: What is the conclusion?

Show Full Solution (Part D)

.

. We fail to reject .

There is insufficient evidence at the 5% level to conclude that the true passage rate for the advanced mathematics placement test differs from 40%.


Question 3 — Error Analysis

Flawed statistical report:

A researcher runs a t-test on observations, gets , and reports “p < 0.05” for a two-tailed test. From the t-table at (two-tail = 0.05) with : . Since , the researcher concludes “p < 0.05 — reject .”

Identify the error.

Show Full Analysis

The error: The researcher used instead of . Degrees of freedom for a one-sample t-test are always .

Corrected analysis: From the t-table, , two-tail : . Since , we still reject at . The conclusion does not change in this case.

Does it always not matter? No. If were between 2.228 and 2.262 — for example, — using the wrong df would change the conclusion from “reject” to “fail to reject.” The off-by-one df error is consequential near the boundary.

The habit to build: Always compute df = n − 1 first and write it down before looking up the table.


Self-Assessment

How confident do you feel about hypothesis testing for small sample means and proportions?

Still confusedReady for the Boss Fight

Section 8: Boss Fight

Choose your path. Both require full five-step reasoning and integration across the lesson’s concepts.

🧪 Path A: The Health Researcher

A hospital claims mean recovery time is 4.5 days. A skeptical researcher collects data and must decide whether the evidence supports the hospital’s claim — and defend the statistical reasoning to a medical board.

📊 Path B: The Policy Analyst

A government report states 35% of college students work more than 20 hours per week. A student advocacy group challenges this figure with survey data — and must communicate their findings responsibly.

Path A: The Health Researcher

A hospital claims the mean recovery time after a specific surgery is days. A skeptical researcher collects data on 18 patients () and finds days, days. The hospital states that patients are selected because they are good surgical candidates, so the population of recovery times is approximately normal. Test at .

Task 1. State and . Justify whether a one-tailed or two-tailed test is appropriate. Then justify the choice of t-test vs. z-test.

Show Guidance for Task 1

days; days (two-tailed).

Why two-tailed? The researcher is a skeptic — she wants to know if the true mean differs from the hospital’s claim in either direction. If she had a prior directional suspicion (e.g., “I think the hospital is underreporting”), a one-tailed test would be justified. Without such a prior, two-tailed is the appropriate choice.

Why t-test? is unknown (only days is given), and . The approximate normality condition is met (stated). Use t with .


Task 2. Compute the test statistic and bound the p-value using the t-table. Show all work.

Show Guidance for Task 2

days.

. From the t-table, df = 17 row:

  • → two-tail
  • → two-tail

Since : .


Task 3. State a conclusion in context. Then explain what a Type II error would mean in this medical context.

Show Guidance for Task 3

(since is bounded between 0.02 and 0.05, and 0.02 and 0.05 are both less than — wait, could be as high as 0.05, but at two-tail 0.05, so strictly). We reject .

Conclusion: There is sufficient evidence at the 5% significance level to conclude that the mean recovery time differs from 4.5 days. The sample suggests recovery times may be longer than the hospital claims.

Type II error in this context: A Type II error would mean failing to reject when the true mean recovery time actually exceeds (or differs from) 4.5 days. Practically: the hospital’s inflated claim goes unchallenged. Patients may be given inaccurate discharge expectations, and policymakers may allocate too few resources to post-operative care. The consequence is real harm from a false reassurance.


Task 4. A colleague suggests: “Just use z — the sample of 18 is close enough to 30.” Explain the flaw in this reasoning and what effect it would have on the conclusion.

Show Guidance for Task 4

The flaw: the criterion for using z is not sample size — it is whether is known. Here is unknown (only is given), so the t-distribution is required regardless of sample size.

Using z with df effectively treated as infinite ( at two-tailed) vs. using t ( at df = 17, two-tailed):

  • z would give a slightly smaller critical value (1.96 vs. 2.110), making it easier to reject .
  • In this case: (t critical), so we barely reject with t. We would also reject with z (since ), so the conclusion is the same here — but using z is conceptually wrong.

The error matters more in borderline cases: if , we would fail to reject with the t-table (since ) but would reject with z (since ). Using the wrong distribution in that case would change the conclusion.

Reflection: Write a two-sentence conclusion for the hospital board that clearly states what the data can and cannot prove. Use language appropriate for a non-statistical audience.

0 / 500

Path B: The Policy Analyst

A government report states that 35% of college students work more than 20 hours per week (). A student advocacy group surveys 150 students at a large university () and finds 63 who work more than 20 hours per week. Test at (two-tailed).

Task 1. State and . Check conditions. Explain your choice of a two-tailed test.

Show Guidance for Task 1

; (two-tailed).

Why two-tailed? The advocacy group wants to know whether the true proportion at their university differs from the government’s figure — in either direction. A one-tailed test would require a prior directional suspicion (e.g., “we believe the rate is higher”). Without that prior, two-tailed is correct.

Conditions: ✓; ✓; random sample ✓. Use the z-test for proportions.


Task 2. Compute and the test statistic. Find the p-value using the z-table.

Show Guidance for Task 2

.

Two-tailed: .


Task 3. State the decision at . Then construct a 95% CI for and verify it gives the same conclusion.

Show Guidance for Task 3

. We fail to reject .

There is insufficient evidence at the 5% level to conclude that the proportion of college students working more than 20 hours per week at this university differs from 35%.

95% CI for : The CI for a proportion uses in the SE, not .

Verification: falls inside the 95% CI → fail to reject at . ✓ Both approaches agree.


Task 4. The advocacy group wants to claim the real rate is “dramatically higher” than 35%. A statistician cautions against this phrasing. Explain using the concepts of practical vs. statistical significance.

Show Guidance for Task 4

Statistical significance: The test failed to reject at . The data are not statistically surprising under . Making a strong claim about “dramatically higher” is not supported by the test result.

Practical significance: Even if the test had rejected , statistical significance does not prove the difference is large or important. The sample proportion is vs. — a difference of 0.07 (7 percentage points). Whether 7 percentage points is “dramatic” is a policy judgment, not a statistical one.

The caution to the group: With , the data do not even provide sufficient evidence of a difference from 35%. Claiming “dramatically higher” overstates what the evidence supports. The correct statement: “Our sample shows 42% — 7 percentage points above the government’s 35% — but we do not have sufficient statistical evidence at the 5% level to conclude the true rate at our university differs from the national figure. A larger sample would be needed to draw firmer conclusions.”

Reflection: Draft a headline for a press release that is both accurate and responsible given your statistical conclusion. Then explain in 1–2 sentences why your headline is more appropriate than “Study Shows College Students Work Far More Than Government Claims.”

0 / 500

Section 9: Challenge Problems

Ready for more? These go beyond the lesson objectives.

Problem 1 — Paired Data: t-Test on Differences

A researcher tests whether a training program improves typing speed. Seven participants are measured before and after training. Compute the difference for each participant, then run a one-sample t-test on vs. .

ParticipantBeforeAfter
15258
24550
36065
43842
55562
64851
75055

(a) Compute , , and .

Show Solution

Differences : 6, 5, 5, 4, 7, 3, 5.

wpm.

: deviations from : 1, 0, 0, −1, 2, −2, 0.

. wpm.

(b) , .

From t-table, df = 6: at two-tail . Since , . Reject .

There is very strong evidence that the training program improves typing speed.

A coach measures reaction time (milliseconds) for 6 athletes before and after a conditioning drill. Compute and test vs. (reaction time decreases) at .

AthleteBeforeAfter
1280265
2310295
3295290
4320305
5275260
6300285
Show Solution

Differences : −15, −15, −5, −15, −15, −15.

ms.

: deviations from : −1.67, −1.67, 8.33, −1.67, −1.67, −1.67.

. ms.

, .

Left-tailed: (one-tail , df = 5). . Reject .

There is strong evidence that reaction time decreases after the conditioning drill.

A dietitian records caloric intake (kcal) for 8 patients before and after a dietary intervention. Test vs. at .

PatientBeforeAfter
124002200
221002050
328002500
423002250
526002350
619001950
722002100
825002300
Show Solution

Differences : −200, −50, −300, −50, −250, +50, −100, −200.

kcal.

: compute deviations from , square, sum, divide by 7, take square root.

Deviations: −62.5, +87.5, −162.5, +87.5, −112.5, +187.5, +37.5, −62.5.

.

. kcal.

, .

Two-tailed, df = 7: (two-tail 0.05). . Checking further: at two-tail 0.02. Since … wait: df = 7, at two-tail . , so . Reject .

There is sufficient evidence that the dietary intervention changes mean caloric intake.


Problem 2 — Power of the t-Test

A factory is testing whether bottle fill mean equals mL. They use (two-tailed), , and mL. Suppose the true mean is actually mL.

(a) What is the approximate power of this test? Use the non-centrality approach: compute the non-centrality parameter and approximate power as using the standard normal (a rough approximation).

Show Solution

.

Critical value: (df = 15, two-tailed ).

Non-centrality approximation: the test rejects when . Under the true distribution (mean shifted by ), the rejection probability is approximately .

(We also need to account for the left tail, but it’s negligible here.)

Approximate power .

(b) This is very low power — only a 13% chance of detecting the true shift from 500 to 505 mL. What would be needed to achieve 80% power?

With and a shift of SE, power is approximately 13%. To reach 80%, we need the shift to be approximately 2.8 SE from the critical value, meaning must increase substantially. Using the standard power formula: in terms of standardized shift. Here the shift is only 5 mL = 1 SE, which is small. A rough estimate: requires to detect a 5-mL shift with 80% power using .

Section 10: Solutions Reference

Complete, step-by-step solutions for all problems in Sections 5–9 are available on the solutions page. Solutions include full five-step write-ups, t-table lookups shown explicitly, and interpretation guidance.

View Full Solutions →

If you’re stuck: Re-read the relevant Core Concept in Section 3. For t-test problems, check whether you used vs. in the denominator — that is the single most common error in this lesson. For t-table lookups, confirm you used the correct df row and the correct tail column (one-tail vs. two-tail). The solutions page shows the reasoning behind every step, not just the final answer.

Quick-Reference Formulas

t Test Statistic (mean, small sample / unknown):

z Test Statistic (proportion):

Proportion Conditions (check before testing):

Decision Rule (both tests):

When to Use the t-Test:

Bounding p from the t-table:

Key distinction — proportion test denominator:

Student's t-Distribution Table

Critical values (t*) for given degrees of freedom (df) and tail area.

df Confidence
80%90%95%98%99%99.9%
0.10 (1) 0.20 (2)
0.05 (1) 0.10 (2)
0.025 (1) 0.05 (2)
0.01 (1) 0.02 (2)
0.005 (1) 0.01 (2)
0.0005 (1) 0.001 (2)
1 3.0786.31412.70631.82163.657636.619
2 1.8862.9204.3036.9659.92531.599
3 1.6382.3533.1824.5415.84112.924
4 1.5332.1322.7763.7474.6048.610
5 1.4762.0152.5713.3654.0326.869
6 1.4401.9432.4473.1433.7075.959
7 1.4151.8952.3652.9983.4995.408
8 1.3971.8602.3062.8963.3555.041
9 1.3831.8332.2622.8213.2504.781
10 1.3721.8122.2282.7643.1694.587
11 1.3631.7962.2012.7183.1064.437
12 1.3561.7822.1792.6813.0554.318
13 1.3501.7712.1602.6503.0124.221
14 1.3451.7612.1452.6242.9774.140
15 1.3411.7532.1312.6022.9474.073
16 1.3371.7462.1202.5832.9214.015
17 1.3331.7402.1102.5672.8983.965
18 1.3301.7342.1012.5522.8783.922
19 1.3281.7292.0932.5392.8613.883
20 1.3251.7252.0862.5282.8453.850
21 1.3231.7212.0802.5182.8313.819
22 1.3211.7172.0742.5082.8193.792
23 1.3191.7142.0692.5002.8073.768
24 1.3181.7112.0642.4922.7973.745
25 1.3161.7082.0602.4852.7873.725
26 1.3151.7062.0562.4792.7793.707
27 1.3141.7032.0522.4732.7713.690
28 1.3131.7012.0482.4672.7633.674
29 1.3111.6992.0452.4622.7563.659
30 1.3101.6972.0422.4572.7503.646
40 1.3031.6842.0212.4232.7043.551
50 1.2991.6762.0092.4032.6783.496
60 1.2961.6712.0002.3902.6603.460
80 1.2921.6641.9902.3742.6393.416
100 1.2901.6601.9842.3642.6263.390
1.2821.6461.9622.3302.5813.300