Hypothesis Testing for Small Sample Mean and Proportion

A factory claims its machine fills bottles to a mean of 500 mL. A quality technician is suspicious — but she can only collect 12 bottles for testing (the test destroys each bottle). With only and no known population standard deviation , the z-test from inf-5 is off the table. She needs the right tool for small-sample testing: the t-test.

Meanwhile, a consumer watchdog group surveyed 80 households and found 18 (22.5%) reporting food insecurity. A government agency claims the true rate is only 15%. Is the group’s sample proportion surprising enough to challenge the agency’s claim? That requires the proportion test.

Both questions use the same five steps you mastered in inf-5. The only thing that changes is the formula for the test statistic — and the distribution you read it against. Everything else carries over exactly.

By the end of this lesson, you will be able to:

Identify when to use the t-test vs. the z-test for a population mean.
Compute the t test statistic and bound the p-value using the t-table.
State and check conditions for the proportion test: and .
Compute the z test statistic and find the p-value.
Distinguish statistical significance from practical significance, and explain why “fail to reject” never means “accept .”

What you need coming in — and why it matters today:

Five-step hypothesis test framework (inf-5): State H₀ and Hₐ, check conditions, compute the test statistic, find the p-value, state the conclusion. Every problem in this lesson uses this structure verbatim.
p-value and decision rule (inf-5): The p-value is . Reject if ; fail to reject if . These rules are identical in inf-6.
t-distribution and degrees of freedom (inf-3): When is unknown, we use the t-distribution with . The t-table gives critical values, not exact p-values — we bound the p-value between two levels.
Sample proportion and SE formula (INF-4): The point estimate is (sample successes divided by sample size). The SE for a proportion uses the null value , not .
”Fail to reject” language (inf-5): Never “accept .” The correct forms are “reject ” (when ) and “fail to reject ” (when ).

Quick check — can you recall these?

What are the degrees of freedom for a one-sample -test?

I can state the five steps of a hypothesis test in order. I know the decision rule: reject

if p-value <

. I know how to find df for a t-procedure: df = n − 1. I can read a critical t-value from the t-table for a given df and confidence level.

Success Factor:

What changes in this lesson: In inf-5, you always knew — so you computed a z test statistic and read an exact p-value from the z-table. In this lesson, is unknown. We use instead, which forces us to use the t-distribution. The t-table gives bounds (e.g., 0.02 < p < 0.05) rather than exact p-values — and that is sufficient to make the reject/fail-to-reject decision. For proportion tests, the z-distribution still applies (because the proportion test statistic is approximately normal when conditions are met), but we use — not — in the denominator.

Retrieval Warm-up — from earlier lessons

A researcher constructs a one-sample t-interval from observations with unknown . She looks up the critical value from the t-table. Which row should she use?

A z-test yields a p-value of 0.03. The researcher set before seeing the data. She says: “I reject — and since p = 0.03 < 0.05, the probability that is true is only 3%.” Identify the error.

How this section is organized: Seven concepts cover both test types in order. C1–C3 handle the t-test for a mean. C4–C5 introduce the proportion test. C6 reviews one-vs.-two-tailed choice (same rules as inf-5). C7 connects both tests to a broader idea you’ll use in reg-3 and beyond.

C1–C3: The t-test — when to use it, the formula, how to bound p-values from the t-table
C4–C5: The proportion test — conditions, formula, and the p₀ vs. p̂ distinction
C6: Choosing the tail — identical rules as inf-5
C7: Bridging idea — practical vs. statistical significance

C1 — When to Use the t-Test

In inf-5, you used the z-test: . That formula requires knowing . When is unknown, we estimate it with and switch to the t-distribution. The t-distribution is wider than the normal — it accounts for the extra uncertainty introduced by estimating from a small sample.

When to Use the t-Test for a Mean

Use the one-sample t-test when:

is unknown (you have , the sample standard deviation, instead)
The population is approximately normal (required when ; less critical for larger n)
The sample is random (independence of observations)

When is known, use the z-test from inf-5 regardless of sample size.

Decision rule in plain English: If the problem gives you → z-test. If the problem gives you and is unknown → t-test. Sample size ( vs. ) does not override this — what matters is whether is known.

What are you testing? Read the research question: is the parameter a mean or a proportion?

Is σ known? The deciding factor — not the sample size.

z-test for a mean $ z = \dfrac{\bar{x} - \mu_0}{\sigma/\sqrt{n}} $ INF-5 · any n

t-test for a mean $ t = \dfrac{\bar{x} - \mu_0}{s/\sqrt{n}} $ $ df = n - 1 $ · need roughly normal population for small n

Check conditions $ np_0 \ge 10 $ and $ n(1-p_0) \ge 10 $ — use $ p_0 $, the claim, not $ \hat{p} $.

z-test for a proportion $ z = \dfrac{\hat{p} - p_0}{\sqrt{p_0(1-p_0)/n}} $ denominator uses $ p_0 $ · z (not t), even for small n

exact / binomial methods z-approximation not valid use exact binomial test or increase n so $ np_0 \ge 10 $

Pick an answer above to trace the path — or read the whole map at a glance.

✗ Common mistake: picking z over t because “n is big enough.” Sample size never decides t vs z — only whether σ is known does.

Figure: Choosing the right test. Answer the two questions and the path lights up; the whole map stays visible so you can see every branch. For a mean, the only question that matters is whether σ is known — sample size never switches t to z, or a proportion test to t.

A very common mistake: using the z-test when only is given, reasoning “n is large enough.” The distinction is not about sample size — it is about whether is known. If you see in the problem and no mention of a known , use the t-distribution. With large , the t-distribution converges to the standard normal, so the numerical difference becomes small — but the principle remains: known, not → t.

Figure: The t-distribution (solid, heavier line) vs. the standard normal (dashed reference). With few degrees of freedom the t-curve has a shorter peak and fatter tails — the gold area is the t tail probability beyond ±1.96, which forces t* to be larger than z* = 1.96. Drag df upward and watch the two curves merge: that is why t → z as n grows.

C2 — The t Test Statistic

The formula is almost identical to the z-test statistic. Replace with , and replace with .

t Test Statistic for a Population Mean

where is the null hypothesis value of the mean, is the sample standard deviation, and is the sample size.

A large means the data are far from what predicts — strong evidence against . The degrees of freedom determine which t-distribution to use when reading the t-table.

Mini-example: Suppose , , , .

. We look along the df = 8 row of the t-table to bound the p-value.

The most common arithmetic error: using instead of . For , — not 9. Using the wrong df row gives the wrong critical value and can change your decision. Always subtract 1.

C3 — Reading p-Values from the t-Table

The z-table gives exact p-values. The t-table works differently: it gives critical values for standard levels. To find the p-value for a computed t, you locate your df row and find which two critical values your falls between. This gives you a p-value bound: “p is between 0.02 and 0.05.”

Bounding the p-Value from the t-Table

For a computed with known :

Find the row in the t-table.
Locate the two critical values that bracket : the one just below and the one just above.
Read the corresponding (one-tail or two-tail, depending on your ) for each critical value.
The p-value lies between those two values.

Example: , , two-tailed test. From the t-table, df = 11 row: (two-tail ) and (two-tail ). Since , we have .

Student's t-Distribution Table

Critical values (t*) for given degrees of freedom (df) and tail area.

df	Confidence
	80%	90%	95%	98%	99%	99.9%
	0.10 (1) 0.20 (2)	0.05 (1) 0.10 (2)	0.025 (1) 0.05 (2)	0.01 (1) 0.02 (2)	0.005 (1) 0.01 (2)	0.0005 (1) 0.001 (2)
1	3.078	6.314	12.706	31.821	63.657	636.619
2	1.886	2.920	4.303	6.965	9.925	31.599
3	1.638	2.353	3.182	4.541	5.841	12.924
4	1.533	2.132	2.776	3.747	4.604	8.610
5	1.476	2.015	2.571	3.365	4.032	6.869
6	1.440	1.943	2.447	3.143	3.707	5.959
7	1.415	1.895	2.365	2.998	3.499	5.408
8	1.397	1.860	2.306	2.896	3.355	5.041
9	1.383	1.833	2.262	2.821	3.250	4.781
10	1.372	1.812	2.228	2.764	3.169	4.587
11	1.363	1.796	2.201	2.718	3.106	4.437
12	1.356	1.782	2.179	2.681	3.055	4.318
13	1.350	1.771	2.160	2.650	3.012	4.221
14	1.345	1.761	2.145	2.624	2.977	4.140
15	1.341	1.753	2.131	2.602	2.947	4.073
16	1.337	1.746	2.120	2.583	2.921	4.015
17	1.333	1.740	2.110	2.567	2.898	3.965
18	1.330	1.734	2.101	2.552	2.878	3.922
19	1.328	1.729	2.093	2.539	2.861	3.883
20	1.325	1.725	2.086	2.528	2.845	3.850
21	1.323	1.721	2.080	2.518	2.831	3.819
22	1.321	1.717	2.074	2.508	2.819	3.792
23	1.319	1.714	2.069	2.500	2.807	3.768
24	1.318	1.711	2.064	2.492	2.797	3.745
25	1.316	1.708	2.060	2.485	2.787	3.725
26	1.315	1.706	2.056	2.479	2.779	3.707
27	1.314	1.703	2.052	2.473	2.771	3.690
28	1.313	1.701	2.048	2.467	2.763	3.674
29	1.311	1.699	2.045	2.462	2.756	3.659
30	1.310	1.697	2.042	2.457	2.750	3.646
40	1.303	1.684	2.021	2.423	2.704	3.551
50	1.299	1.676	2.009	2.403	2.678	3.496
60	1.296	1.671	2.000	2.390	2.660	3.460
80	1.292	1.664	1.990	2.374	2.639	3.416
100	1.290	1.660	1.984	2.364	2.626	3.390
∞	1.282	1.646	1.962	2.330	2.581	3.300

Why bounds are enough: If and , then — we reject . If and , then — we fail to reject . The bound tells us which side of the p-value falls on. That is all we need for the decision.

Figure: The t-distribution (solid) for the chosen degrees of freedom, with the standard normal (dashed) shown for reference. The shaded region is the p-value area implied by the t-value and tail you choose.

Figure: Reading a p-value bound from the t-table. Choose your degrees of freedom and slide |t|. The two critical values that bracket |t| light up (lower and upper), and the p-value bound is read off their α columns. The t-table never gives an exact p — a bound is all you need to compare against α.

For a two-tailed test, you must use the two-tail column of the t-table (or equivalently, double the one-tail area). If your uses ≠, the p-value is . Reading the one-tail column without multiplying by 2 will give you a p-value that is half the correct size — and can lead you to reject when you should not.

C4 — Conditions for the Proportion Test

When the research question is about a population proportion , we use a z-test (not t) — even for relatively small samples. This is possible because the sampling distribution of is approximately normal when the sample is large enough relative to .

Conditions for the One-Sample Proportion Test

Before conducting a proportion test with , verify:

— at least 10 expected successes under
— at least 10 expected failures under
Random sample — observations are independent

Note: Use (the null value), not (the sample estimate), when checking conditions.

Check conditions using , not . The conditions assess whether the null hypothesis model produces enough expected counts — which requires . If you use instead, you may incorrectly declare conditions met (or unmet) when the opposite is true.

C5 — The Proportion Test Statistic

The formula has the same shape as the z test statistic from inf-5, but with in the numerator and in the denominator.

z Test Statistic for a Population Proportion

where is the sample proportion, is the null hypothesis value, and is the sample size.

The p-value for a two-tailed test is , read from the standard normal table.

Figure: Anatomy of the proportion test statistic. The numerator (purple) is the observed gap between the sample proportion p̂ and the claimed value p₀. The denominator (gold) is the standard error under H₀ — it always uses p₀, never p̂. Swapping them is the single most common error in this lesson.

Why in the denominator: We are testing whether is true. To compute how surprising our data are under , we assume is true and ask: “What is the standard error of if the true proportion really were ?” The SE under is — using because that is what the true proportion would be if were true. Using instead would mean assuming our sample estimate is exactly right — circular reasoning.

This is the most common error in proportion tests: plugging into the denominator instead of . The denominator represents the SE under the null hypothesis. The null hypothesis says , so use . Remember: the numerator measures the gap between what we observed () and what claims (); the denominator measures the variability we’d expect if were true.

Figure: The one-sample z test for a proportion. First check the success–failure conditions (np₀ ≥ 10 and n(1−p₀) ≥ 10) — if either fails, the normal approximation is unreliable. Then read the statistic z = (p̂ − p₀) / √(p₀(1−p₀)/n): the standard error is computed under H₀, so it always uses the claim p₀, never the sample p̂. The shaded tail(s) on the standard-normal curve give the p-value for the chosen alternative.

C6 — One-Tailed vs. Two-Tailed Tests

The rules for choosing the tail are identical to inf-5. Set from the research question before collecting data.

Choosing the Tail

Two-tailed ( or ): Use when the research question asks whether the parameter differs from (or ) in any direction. p-value = or .

Left-tailed ( or ): Use when the claim is specifically that the parameter is below the null value. p-value = or .

Right-tailed ( or ): Use when the claim is specifically that the parameter is above the null value. p-value = or .

Figure: One tail or two? Choose the alternative hypothesis from the research question before seeing the data, then read the shaded p-area. The same observed t gives a different p-value depending on the tail — which is exactly why picking the tail after peeking at the data (data snooping) inflates the true error rate toward 2α.

Choose the tail from the research question — not from the data. Selecting a one-tailed test after seeing the direction of the sample mean or sample proportion is data snooping. It inflates the true Type I error rate to roughly . If the question does not give a directional prior claim, use a two-tailed test.

C7 — Practical vs. Statistical Significance

A statistically significant result (p < α) tells you the effect is unlikely to be due to chance. It does not tell you the effect is large or meaningful.

With a very large sample, even a tiny difference from can produce . For example, a factory producing bottles at a mean of 500.1 mL instead of 500 mL might have with — but 0.1 mL is practically irrelevant. Always report the effect size (e.g., the actual difference or ) alongside the p-value, and ask: “Does this difference matter in the real world?” Statistical significance alone does not answer that question.

Sample size n = 159 (log scale)

Figure: A trivial 0.1 mL difference, tested at larger and larger n. The effect size never changes — 0.1 mL is still negligible against a 5 mL "matters" threshold. Yet the p-value slides below α = 0.05 once n is large enough. A tiny, meaningless effect becomes "statistically significant" purely because the sample is huge. Always report the effect size alongside the p-value.

Example 1 — Fully Worked: Caloric Intake (Two-Tailed t-Test)

A nutritionist claims adults in a certain region consume a mean of kcal/day. A random sample of adults yields kcal and kcal. Test at .

Student's t-Distribution Table

Critical values (t*) for given degrees of freedom (df) and tail area.

df	Confidence
	80%	90%	95%	98%	99%	99.9%
	0.10 (1) 0.20 (2)	0.05 (1) 0.10 (2)	0.025 (1) 0.05 (2)	0.01 (1) 0.02 (2)	0.005 (1) 0.01 (2)	0.0005 (1) 0.001 (2)
1	3.078	6.314	12.706	31.821	63.657	636.619
2	1.886	2.920	4.303	6.965	9.925	31.599
3	1.638	2.353	3.182	4.541	5.841	12.924
4	1.533	2.132	2.776	3.747	4.604	8.610
5	1.476	2.015	2.571	3.365	4.032	6.869
6	1.440	1.943	2.447	3.143	3.707	5.959
7	1.415	1.895	2.365	2.998	3.499	5.408
8	1.397	1.860	2.306	2.896	3.355	5.041
9	1.383	1.833	2.262	2.821	3.250	4.781
10	1.372	1.812	2.228	2.764	3.169	4.587
11	1.363	1.796	2.201	2.718	3.106	4.437
12	1.356	1.782	2.179	2.681	3.055	4.318
13	1.350	1.771	2.160	2.650	3.012	4.221
14	1.345	1.761	2.145	2.624	2.977	4.140
15	1.341	1.753	2.131	2.602	2.947	4.073
16	1.337	1.746	2.120	2.583	2.921	4.015
17	1.333	1.740	2.110	2.567	2.898	3.965
18	1.330	1.734	2.101	2.552	2.878	3.922
19	1.328	1.729	2.093	2.539	2.861	3.883
20	1.325	1.725	2.086	2.528	2.845	3.850
21	1.323	1.721	2.080	2.518	2.831	3.819
22	1.321	1.717	2.074	2.508	2.819	3.792
23	1.319	1.714	2.069	2.500	2.807	3.768
24	1.318	1.711	2.064	2.492	2.797	3.745
25	1.316	1.708	2.060	2.485	2.787	3.725
26	1.315	1.706	2.056	2.479	2.779	3.707
27	1.314	1.703	2.052	2.473	2.771	3.690
28	1.313	1.701	2.048	2.467	2.763	3.674
29	1.311	1.699	2.045	2.462	2.756	3.659
30	1.310	1.697	2.042	2.457	2.750	3.646
40	1.303	1.684	2.021	2.423	2.704	3.551
50	1.299	1.676	2.009	2.403	2.678	3.496
60	1.296	1.671	2.000	2.390	2.660	3.460
80	1.292	1.664	1.990	2.374	2.639	3.416
100	1.290	1.660	1.984	2.364	2.626	3.390
∞	1.282	1.646	1.962	2.330	2.581	3.300

Step 1: State and

kcal kcal (two-tailed — the researcher is testing whether the mean differs in any direction from the claim)

Step 2: Check Conditions

and is unknown → I must use the t-distribution. Caloric intake is approximately normally distributed for the target population ✓. Random sample ✓. Conditions met: use t with .

Step 3: Compute the Test Statistic

I notice kcal.

I choose to work with the positive value because I’ll use for the two-tailed p-value.

Step 4: Find the p-value (bound from t-table)

. Looking along the df = 15 row of the t-table:

corresponds to two-tail
corresponds to two-tail

Since , the two-tailed p-value is between 0.05 and 0.10: .

Step 5: Conclusion

. We fail to reject .

There is insufficient evidence at the 5% significance level to conclude that the mean caloric intake in this region differs from 2,000 kcal/day.

Example 2 — Prediction Checkpoint: Tire Durability (One-Tailed t-Test)

A tire manufacturer claims its tires last km. A sample of tires gives km and km. Test at .

Steps 1–3:

km; km (left-tailed — we’re testing whether tires fail to meet the lower bound). Conditions: , unknown, approximately normal population assumed ✓. Use t with .

km.

Pause here. Before reading the p-value and decision:

with . Looking at the df = 8 row of the t-table: (one-tail 0.10) and (one-tail 0.05) and (one-tail 0.025). Where does fall?
Do you expect to reject at ? Write down your prediction before continuing.

Show Solution (Steps 4–5)

Step 4: Left-tailed test. , . From the t-table, df = 8 row:

→ one-tail
→ one-tail

Since : .

Step 5: . We fail to reject .

There is insufficient evidence at the 1% significance level to conclude that the mean tire life is less than 50,000 km. (Note: we would reject at , but not at the stricter .)

Example 3 — Details/Summary: Recycling Rate (Two-Tailed Proportion Test)

A city claims 30% of households recycle regularly. A consumer survey of households finds 54 that recycle (). Test at .

Show Full Solution

Step 1: ; (two-tailed).

Step 2: Check Conditions

✓

Random sample ✓. Conditions met — use the z test for proportions.

Step 3: Compute and the Test Statistic

Step 4: Two-tailed: .

(from z-table, rounding to 0.93).

Step 5: . We fail to reject .

There is insufficient evidence at the 5% level to conclude that the proportion of households that recycle regularly differs from 30%.

Example 4 — Find the Error: Proportion Test with Two Mistakes

A student tests with , (), . Here is the student’s work.

Student’s analysis:

;

: “I accept . The true rate is 0.40.”

Show Full Error Analysis

Error 1 — Wrong SE: The student used in the denominator instead of . When testing , the SE under uses .

Correct SE:

Correct z:

. The decision is the same (fail to reject), but the formula is wrong and can produce different conclusions in other problems.

Error 2 — “Accept H₀”: We never accept . The correct statement is: “We fail to reject . There is insufficient evidence at the 5% level to conclude the true proportion differs from 0.40.” Failing to reject does not prove is true — it only means the data are not surprising enough under .

Problem 1 — t-Test Decisions (Three Scenarios)

A sleep researcher wants to know if college students sleep less than the recommended 8 hours. She samples students and finds hours and hours. Population is unknown.

Part A: Which test should she use?

Part B: State the correct and .

Part C: Compute and .

A pharmacist claims a generic drug has the same mean effect time as the brand-name version ( min). A random sample of patients using the generic drug shows min and min.

Part A: Which test applies?

Part B: What are and for a two-tailed test?

Part C: Compute and .

A water quality inspector claims that mean lead content in a city’s water supply is ppb. She samples locations and finds ppb and ppb. She wants to test whether the true mean exceeds the claim.

Part A: Which test applies?

Part B: State and .

Part C: What is ?

Problem 2 — Proportion Test Conditions and z Statistic

A public health report claims 20% of adults smoke (). A researcher surveys adults and finds 36 smokers. Test at (two-tailed).

Part A: Are conditions met?

Part B: Compute and the test statistic .

Part C: Is p < 0.05? What is the conclusion?

A university claims that 60% of its graduates find jobs in their field within 6 months (). A journalism investigation surveys graduates and finds 42 employed in their field (). Test at (two-tailed).

Part A: Check conditions.

Part B: Compute the test statistic.

Part C: Conclusion at .

A city claims 35% of commuters use public transit (). A transit authority samples commuters and finds 82 using public transit. Test at (two-tailed).

Part A: Check conditions.

Part B: Compute the test statistic.

Part C: Conclusion at .

Problem 3 — Choosing t vs. z vs. Proportion Test

For each scenario, identify the correct test and state and .

Scenario A: A factory manager claims the mean weight of a product is g. A QC engineer samples 10 units and finds g and g. She wants to know if the mean is below the claim.

Scenario B: A government report states that 40% of households own a pet (). A researcher samples 120 households and finds 54 own a pet. She wants to test whether the true proportion differs from 40%.

Problem 1 — t-Test Generator

Problem 2 — One-Tailed Proportion Test

A pharmaceutical company claims that its new medication reduces blood pressure in more than 50% of patients (). In a trial of patients, 58 showed reduced blood pressure. Test at .

Conduct the full five-step test and show your solution below.

Show Solution

Step 1: ; (right-tailed).

Step 2: ✓; ✓; random sample ✓.

Step 3: .

Step 4: Right-tailed: .

Step 5: . We fail to reject .

There is insufficient evidence at the 5% level to conclude that the medication reduces blood pressure in more than 50% of patients.

A consumer advocacy group claims that fewer than 30% of customers are satisfied with a company’s service (). They survey customers and find 30 satisfied. Test at .

Show Solution

Step 1: ; (left-tailed).

Step 2: ✓; ✓; random sample ✓.

Step 3: .

Step 4: Left-tailed: .

Step 5: . We fail to reject .

There is insufficient evidence at the 5% level to conclude that fewer than 30% of customers are satisfied.

A researcher hypothesizes that more than 70% of teenagers use social media daily (). A survey of teenagers finds 117 using social media daily. Test at .

Show Solution

Step 1: ; (right-tailed).

Step 2: ✓; ✓; random sample ✓.

Step 3: .

Step 4: Right-tailed: .

Step 5: . We fail to reject at .

There is insufficient evidence at the 1% level to conclude that more than 70% of teenagers use social media daily. (Note: we would reject at .)

Problem 3 — Proportion Test Generator

Problem 4 — Find the Error

Problem 5 — Multi-Step Synthesis: Hospital Quality Audit

A regional hospital claims two things: (1) the mean post-operative stay is days; (2) exactly 25% of patients are readmitted within 30 days (). An independent auditor collects the following data from a random sample of 20 recent patients:

Mean post-operative stay: days, days
Readmissions: 7 out of 20 patients ()

Test both claims at .

(a) Test the mean claim. State and , check conditions, compute , bound the p-value, and state the conclusion.

Show Solution (a)

Step 1: ; (two-tailed — the auditor is checking whether the mean differs in any direction).

Step 2: , unknown → t-test. Assume approximately normal post-operative stays ✓. .

Step 3: days.

Step 4: , two-tailed. From the t-table, df = 19 row:

→ two-tail
→ two-tail

Since : .

Step 5: . We reject .

There is sufficient evidence at the 5% level to conclude that the mean post-operative stay differs from 3.5 days.

(b) Now turn to the readmission proportion claim. State and , then check the conditions for the proportion z-test before doing anything else. What do you find?

Show Solution (b)

Step 1: ; (two-tailed).

Step 2 — Check conditions: The proportion z-test requires and .

$✗$

The expected-successes condition fails (). Both conditions must hold, so the normal approximation behind the z-test is not trustworthy with this sample.

Conclusion: stop here. We do not compute or report a p-value, because any p-value from the normal approximation would be unreliable. A failed condition halts the procedure — the same discipline you use for the chi-square test (all ) in reg-4.

What the auditor should do instead: collect a larger sample. To satisfy at , she needs patients (and also keeps ). Only then can the readmission claim be tested.

(c) The auditor reports: “The hospital’s stay claim is rejected, but I cannot draw a valid conclusion about the readmission claim from this sample.” Explain what the stay result does and does not prove, and why the readmission claim could not be tested.

Show Solution (c)

What the stay result shows: There is statistical evidence that the mean post-operative stay differs from 3.5 days. The t-test rejected at , meaning the data would be unlikely if the true mean were 3.5 days.

What it does not show: Rejecting the stay claim does not prove the hospital is negligent — it only shows a statistically detectable difference from 3.5 days. The actual effect is days; whether 0.6 days more is practically significant depends on clinical context.

Why the readmission claim could not be tested: The condition failed (), so the proportion z-test is not valid here. This is not the same as “fail to reject.” There is no valid test result at all — neither evidence for nor against the 25% claim. The sample of 20 is simply too small to assess a proportion near 0.25 using the normal approximation. A larger sample (at least ) is required.

The key point: a rejected mean test alongside an untestable proportion claim is not a contradiction — each claim carries its own conditions and its own uncertainty, and one of them was not even met.

(d) A hospital administrator argues: “Since the auditor found no evidence against our 25% readmission rate, we should accept that it is accurate.” Identify the errors in this reasoning.

Show Solution (d)

There are two errors stacked on top of each other:

First, there was no valid test. The condition failed, so the proportion z-test was never legitimately run. “No evidence against” overstates the situation — the auditor produced no test result at all, not a result favorable to the hospital. Absence of a test is not evidence of accuracy.

Second, even a valid “fail to reject” would not prove . Failing to reject never means “accept ” or “the null is proven true” — it means only that the data are not surprising enough under to reject it. With such a small sample, the procedure would have very low power even if it could be run: a true rate of 0.35 or 0.40 could easily go undetected. Confusing “insufficient evidence” (or here, no valid evidence) with “affirmative proof” is the classic error.

The correct statement: “With the conditions for the proportion test are not met, so we cannot draw any conclusion about the 25% readmission rate. A larger sample (at least ) is needed to test this claim.”

Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — t-Confidence Interval (inf-3)

A sports scientist measures the resting heart rate of elite cyclists: bpm, bpm. Population SD unknown.

(a) Construct a 95% CI using the t-distribution. Use . (b) A colleague says: “Since the interval doesn’t include 50 bpm, we can conclude elite cyclists have a lower resting heart rate than the general population (which averages 70 bpm).” Is this reasoning sound?

Show Solution

(a)

(b) The reasoning is directionally sound. The CI (41.1, 47.3) lies entirely below 50 bpm — the data are inconsistent with bpm at 95% confidence. The CI also lies far below the general population mean of 70 bpm, providing strong evidence that elite cyclists have substantially lower resting heart rates. The conclusion is well-supported; the colleague should use frequentist language: “We are 95% confident the true mean resting heart rate of elite cyclists is between 41.1 and 47.3 bpm, well below the 70 bpm general population mean.”

Review Problem 2 — z Hypothesis Test, Full Five Steps (inf-5)

A manufacturer claims its energy bars contain a mean of kcal. A consumer group tests bars randomly and finds kcal with kcal known from extensive prior testing. Test the claim at (two-tailed). Use .

Show Solution

Step 1 — Hypotheses:

kcal; kcal (two-tailed — checking whether the label is wrong in either direction).

Step 2 — Conditions:

Random sample ✓; ✓; known ✓. z-test applies.

Step 3 — Test statistic:

Step 4 — p-value:

Two-tailed: .

Step 5 — Decision and conclusion:

. Reject .

There is strong statistical evidence that the true mean caloric content differs from the labelled 240 kcal. The observed mean of 251 kcal is about 3.16 standard errors above the claimed value — extremely unlikely if the label were accurate.

Review Problem 3 — Is the t-Procedure Trustworthy? (Conditions Judgment)

Every worked example in this lesson disposed of the normality condition with a quick “approximately normal ✓.” This problem asks you to actually interrogate that condition instead of rubber-stamping it.

A health economist has billing amounts for emergency-room visits at one hospital and wants to test whether the mean charge differs from a regional benchmark of $1,200. Hospital billing amounts are well known to be strongly right-skewed — a few very expensive visits stretch the upper tail. is unknown.

(a) Is a one-sample t-test trustworthy here?

(b) What are her options?

Show Discussion

The t-procedure is robust to mild non-normality, and that robustness improves as grows (the CLT pulls the sampling distribution of toward normal). But with strong skew is outside that comfort zone. Reasonable options:

Collect a substantially larger sample. As grows, becomes approximately normal even for skewed data, and the t-procedure becomes trustworthy.
Transform the data to reduce skew — for right-skewed money/count data a log transform is common — then run the t-test on the transformed scale (and interpret results on that scale).
Use a distribution-free method that does not assume normality (covered in later study).

What she should not do: report the t-test p-value as if it were reliable. Stating the conclusion without flagging the violated condition would overstate the strength of the evidence.

Question 1 — Feynman Test

In your own words: why do we use (not ) in the denominator of the proportion test statistic? Write as if explaining to a classmate who just made this error on a test. Aim for 200–500 characters.

0 / 500

Model Answer

The denominator of the test statistic is the standard error of under — that is, the standard deviation we’d expect for the sample proportion if the null hypothesis were exactly right.

says the true proportion is . If is true, then varies with standard deviation . This is the SE we should use to measure how surprising is under .

If instead we used in the denominator, we’d be computing the SE as if the true proportion equaled our sample estimate — which is circular reasoning. We’d be assuming our sample is exactly right in order to determine whether our sample is surprising. That defeats the purpose of the test.

The rule: denominator uses because we assume is true to calculate the p-value.

Question 2 — Apply: Placement Test

A school claims that 40% of students pass the advanced mathematics placement test (). A sample of 120 students shows 54 passed. Test at (two-tailed).

Part A: Which are the correct and ?

Part B: Are conditions met?

Part C: What is the test statistic?

Part D: What is the conclusion?

Show Full Solution (Part D)

. We fail to reject .

There is insufficient evidence at the 5% level to conclude that the true passage rate for the advanced mathematics placement test differs from 40%.

Question 3 — Error Analysis

Flawed statistical report:

A researcher runs a t-test on observations, gets , and reports “p < 0.05” for a two-tailed test. From the t-table at (two-tail = 0.05) with : . Since , the researcher concludes “p < 0.05 — reject .”

Identify the error.

Show Full Analysis

The error: The researcher used instead of . Degrees of freedom for a one-sample t-test are always .

Corrected analysis: From the t-table, , two-tail : . Since , we still reject at . The conclusion does not change in this case.

Does it always not matter? No. If were between 2.228 and 2.262 — for example, — using the wrong df would change the conclusion from “reject” to “fail to reject.” The off-by-one df error is consequential near the boundary.

The habit to build: Always compute df = n − 1 first and write it down before looking up the table.

Self-Assessment

How confident do you feel about hypothesis testing for small sample means and proportions?

Still confusedReady for the Boss Fight

Choose your path. Both require full five-step reasoning and integration across the lesson’s concepts.

🧪 Path A: The Health Researcher

A hospital claims mean recovery time is 4.5 days. A skeptical researcher collects data and must decide whether the evidence supports the hospital’s claim — and defend the statistical reasoning to a medical board.

📊 Path B: The Policy Analyst

A government report states 35% of college students work more than 20 hours per week. A student advocacy group challenges this figure with survey data — and must communicate their findings responsibly.

Path A: The Health Researcher

A hospital claims the mean recovery time after a specific surgery is days. A skeptical researcher collects data on 18 patients () and finds days, days. The hospital states that patients are selected because they are good surgical candidates, so the population of recovery times is approximately normal. Test at .

Task 1. State and . Justify whether a one-tailed or two-tailed test is appropriate. Then justify the choice of t-test vs. z-test.

Show Guidance for Task 1

days; days (two-tailed).

Why two-tailed? The researcher is a skeptic — she wants to know if the true mean differs from the hospital’s claim in either direction. If she had a prior directional suspicion (e.g., “I think the hospital is underreporting”), a one-tailed test would be justified. Without such a prior, two-tailed is the appropriate choice.

Why t-test? is unknown (only days is given), and . The approximate normality condition is met (stated). Use t with .

Task 2. Compute the test statistic and bound the p-value using the t-table. Show all work.

Show Guidance for Task 2

days.

. From the t-table, df = 17 row:

→ two-tail
→ two-tail

Since : .

Task 3. State a conclusion in context. Then explain what a Type II error would mean in this medical context.

Show Guidance for Task 3

Because (df = 17), we have . We reject .

Conclusion: There is sufficient evidence at the 5% significance level to conclude that the mean recovery time differs from 4.5 days. The sample mean (5.1 days) lies above the claimed 4.5 days, but the two-tailed test itself establishes only that the mean differs — not a direction.

Type II error in this context: A Type II error would mean failing to reject when the true mean recovery time actually exceeds (or differs from) 4.5 days. Practically: the hospital’s inflated claim goes unchallenged. Patients may be given inaccurate discharge expectations, and policymakers may allocate too few resources to post-operative care. The consequence is real harm from a false reassurance.

Task 4. A colleague suggests: “Just use z — the sample of 18 is close enough to 30.” Explain the flaw in this reasoning and what effect it would have on the conclusion.

Show Guidance for Task 4

The flaw: the criterion for using z is not sample size — it is whether is known. Here is unknown (only is given), so the t-distribution is required regardless of sample size.

Using z with df effectively treated as infinite ( at two-tailed) vs. using t ( at df = 17, two-tailed):

z would give a slightly smaller critical value (1.96 vs. 2.110), making it easier to reject .
In this case: (t critical), so we barely reject with t. We would also reject with z (since ), so the conclusion is the same here — but using z is conceptually wrong.

The error matters more in borderline cases: if , we would fail to reject with the t-table (since ) but would reject with z (since ). Using the wrong distribution in that case would change the conclusion.

Reflection: Write a two-sentence conclusion for the hospital board that clearly states what the data can and cannot prove. Use language appropriate for a non-statistical audience.

0 / 500

Path B: The Policy Analyst

A government report states that 35% of college students work more than 20 hours per week (). A student advocacy group surveys 150 students at a large university () and finds 63 who work more than 20 hours per week. Test at (two-tailed).

Task 1. State and . Check conditions. Explain your choice of a two-tailed test.

Show Guidance for Task 1

; (two-tailed).

Why two-tailed? The advocacy group wants to know whether the true proportion at their university differs from the government’s figure — in either direction. A one-tailed test would require a prior directional suspicion (e.g., “we believe the rate is higher”). Without that prior, two-tailed is correct.

Conditions: ✓; ✓; random sample ✓. Use the z-test for proportions.

Task 2. Compute and the test statistic. Find the p-value using the z-table.

Show Guidance for Task 2

Two-tailed: .

Task 3. State the decision at and write the conclusion in context.

Show Guidance for Task 3

. We fail to reject .

There is insufficient evidence at the 5% level to conclude that the proportion of college students working more than 20 hours per week at this university differs from 35%.

Note how close this is: with , the result is just barely on the “fail to reject” side of . A modestly larger sample could push it across — which is exactly why Task 4 matters.

Task 4. The advocacy group wants to claim the real rate is “dramatically higher” than 35%. A statistician cautions against this phrasing. Explain using the concepts of practical vs. statistical significance.

Show Guidance for Task 4

Statistical significance: The test failed to reject at . The data are not statistically surprising under . Making a strong claim about “dramatically higher” is not supported by the test result.

Practical significance: Even if the test had rejected , statistical significance does not prove the difference is large or important. The sample proportion is vs. — a difference of 0.07 (7 percentage points). Whether 7 percentage points is “dramatic” is a policy judgment, not a statistical one.

The caution to the group: With , the data do not even provide sufficient evidence of a difference from 35%. Claiming “dramatically higher” overstates what the evidence supports. The correct statement: “Our sample shows 42% — 7 percentage points above the government’s 35% — but we do not have sufficient statistical evidence at the 5% level to conclude the true rate at our university differs from the national figure. A larger sample would be needed to draw firmer conclusions.”

Reflection: Draft a headline for a press release that is both accurate and responsible given your statistical conclusion. Then explain in 1–2 sentences why your headline is more appropriate than “Study Shows College Students Work Far More Than Government Claims.”

0 / 500

Ready for more? These go beyond the lesson objectives.

Problem 1 — Paired Data: t-Test on Differences

A researcher tests whether a training program improves typing speed. Seven participants are measured before and after training. Compute the difference for each participant, then run a one-sample t-test on vs. .

Participant	Before	After
1	52	58
2	45	50
3	60	65
4	38	42
5	55	62
6	48	51
7	50	55

(a) Compute , , and .

Show Solution

Differences : 6, 5, 5, 4, 7, 3, 5.

wpm.

: deviations from : 1, 0, 0, −1, 2, −2, 0.

. wpm.

(b) , .

From t-table, df = 6: at two-tail . Since , . Reject .

There is very strong evidence that the training program improves typing speed.

A coach measures reaction time (milliseconds) for 6 athletes before and after a conditioning drill. Compute and test vs. (reaction time decreases) at .

Athlete	Before	After
1	280	265
2	310	295
3	295	290
4	320	305
5	275	260
6	300	285

Show Solution

Differences : −15, −15, −5, −15, −15, −15.

ms.

: deviations from : −1.67, −1.67, 8.33, −1.67, −1.67, −1.67.

. ms.

, .

Left-tailed: (one-tail , df = 5). . Reject .

There is strong evidence that reaction time decreases after the conditioning drill.

A dietitian records caloric intake (kcal) for 8 patients before and after a dietary intervention. Test vs. at .

Patient	Before	After
1	2400	2200
2	2100	2050
3	2800	2500
4	2300	2250
5	2600	2350
6	1900	1950
7	2200	2100
8	2500	2300

Show Solution

Differences : −200, −50, −300, −50, −250, +50, −100, −200.

kcal.

: compute deviations from , square, sum, divide by 7, take square root.

Deviations: −62.5, +87.5, −162.5, +87.5, −112.5, +187.5, +37.5, −62.5.

. kcal.

, .

Two-tailed, df = 7: (two-tail 0.05). . Checking further: at two-tail 0.02. Since … wait: df = 7, at two-tail . , so . Reject .

There is sufficient evidence that the dietary intervention changes mean caloric intake.

Problem 2 — Power of the t-Test

A factory is testing whether bottle fill mean equals mL. They use (two-tailed), , and mL. Suppose the true mean is actually mL.

Figure: Power and Type II error. The left curve is the sampling distribution if H₀ is true; the right curve is the truth, shifted by the standardised effect δ. The blue area is power (correctly rejecting), the gold area is β (a Type II error — a real shift goes undetected). Raise n to grow δ and watch power climb: with n = 16 and a 5 mL shift, power is only ≈ 13%.

(a) What is the approximate power of this test? Use the non-centrality approach: compute the non-centrality parameter and approximate power as using the standard normal (a rough approximation).

Show Solution

Critical value: (df = 15, two-tailed ).

Non-centrality approximation: the test rejects when . Under the true distribution (mean shifted by ), the rejection probability is approximately .

(We also need to account for the left tail, but it’s negligible here.)

Approximate power .

(b) This is very low power — only a 13% chance of detecting the true shift from 500 to 505 mL. What sample size would be needed to achieve 80% power?

The relevant quantity is the standardized effect size — the shift measured in standard deviations, not standard errors:

The standard (normal-approximation) sample-size formula for a two-tailed test is

So detecting a 5-mL shift (with ) at 80% power requires roughly — about eight times the original sample of 16. The small standardized effect () is what makes the required sample so large. (This normal approximation ignores the extra width of the t-distribution, so the exact requirement is a few observations higher.)

Complete, step-by-step solutions for all problems in Sections 5–9 are available on the solutions page. Solutions include full five-step write-ups, t-table lookups shown explicitly, and interpretation guidance.

View Full Solutions →

If you’re stuck: Re-read the relevant Core Concept in Section 3. For t-test problems, check whether you used vs. in the denominator — that is the single most common error in this lesson. For t-table lookups, confirm you used the correct df row and the correct tail column (one-tail vs. two-tail). The solutions page shows the reasoning behind every step, not just the final answer.

Quick-Reference Formulas

t Test Statistic (mean, small sample / unknown):

z Test Statistic (proportion):

Proportion Conditions (check before testing):

Decision Rule (both tests):

When to Use the t-Test:

unknown (you have )
Population approximately normal (required when )

Bounding p from the t-table:

Find the df row
Locate the two critical values bracketing
Read off the two-tail values → those are the bounds on
Compare to to decide

Key distinction — proportion test denominator:

Use (null value), not (sample estimate)
Reason: we assume is true to compute the SE

Student's t-Distribution Table

Critical values (t*) for given degrees of freedom (df) and tail area.

df	Confidence
	80%	90%	95%	98%	99%	99.9%
	0.10 (1) 0.20 (2)	0.05 (1) 0.10 (2)	0.025 (1) 0.05 (2)	0.01 (1) 0.02 (2)	0.005 (1) 0.01 (2)	0.0005 (1) 0.001 (2)
1	3.078	6.314	12.706	31.821	63.657	636.619
2	1.886	2.920	4.303	6.965	9.925	31.599
3	1.638	2.353	3.182	4.541	5.841	12.924
4	1.533	2.132	2.776	3.747	4.604	8.610
5	1.476	2.015	2.571	3.365	4.032	6.869
6	1.440	1.943	2.447	3.143	3.707	5.959
7	1.415	1.895	2.365	2.998	3.499	5.408
8	1.397	1.860	2.306	2.896	3.355	5.041
9	1.383	1.833	2.262	2.821	3.250	4.781
10	1.372	1.812	2.228	2.764	3.169	4.587
11	1.363	1.796	2.201	2.718	3.106	4.437
12	1.356	1.782	2.179	2.681	3.055	4.318
13	1.350	1.771	2.160	2.650	3.012	4.221
14	1.345	1.761	2.145	2.624	2.977	4.140
15	1.341	1.753	2.131	2.602	2.947	4.073
16	1.337	1.746	2.120	2.583	2.921	4.015
17	1.333	1.740	2.110	2.567	2.898	3.965
18	1.330	1.734	2.101	2.552	2.878	3.922
19	1.328	1.729	2.093	2.539	2.861	3.883
20	1.325	1.725	2.086	2.528	2.845	3.850
21	1.323	1.721	2.080	2.518	2.831	3.819
22	1.321	1.717	2.074	2.508	2.819	3.792
23	1.319	1.714	2.069	2.500	2.807	3.768
24	1.318	1.711	2.064	2.492	2.797	3.745
25	1.316	1.708	2.060	2.485	2.787	3.725
26	1.315	1.706	2.056	2.479	2.779	3.707
27	1.314	1.703	2.052	2.473	2.771	3.690
28	1.313	1.701	2.048	2.467	2.763	3.674
29	1.311	1.699	2.045	2.462	2.756	3.659
30	1.310	1.697	2.042	2.457	2.750	3.646
40	1.303	1.684	2.021	2.423	2.704	3.551
50	1.299	1.676	2.009	2.403	2.678	3.496
60	1.296	1.671	2.000	2.390	2.660	3.460
80	1.292	1.664	1.990	2.374	2.639	3.416
100	1.290	1.660	1.984	2.364	2.626	3.390
∞	1.282	1.646	1.962	2.330	2.581	3.300

INF-6: Hypothesis Testing for Small Sample Mean and Proportion

Section 1: Introduction

Section 2: Prerequisites

Section 3: Core Concepts

C1 — When to Use the t-Test

When to Use the t-Test for a Mean

C2 — The t Test Statistic

t Test Statistic for a Population Mean

C3 — Reading p-Values from the t-Table

Bounding the p-Value from the t-Table

Student's t-Distribution Table

C4 — Conditions for the Proportion Test

Conditions for the One-Sample Proportion Test

C5 — The Proportion Test Statistic

z Test Statistic for a Population Proportion

C6 — One-Tailed vs. Two-Tailed Tests

Choosing the Tail

C7 — Practical vs. Statistical Significance

Section 4: Worked Examples

Example 1 — Fully Worked: Caloric Intake (Two-Tailed t-Test)

Student's t-Distribution Table

Example 2 — Prediction Checkpoint: Tire Durability (One-Tailed t-Test)

Example 3 — Details/Summary: Recycling Rate (Two-Tailed Proportion Test)

Example 4 — Find the Error: Proportion Test with Two Mistakes

Section 5: Guided Practice

Problem 1 — t-Test Decisions (Three Scenarios)

Problem 2 — Proportion Test Conditions and z Statistic

Problem 3 — Choosing t vs. z vs. Proportion Test

Section 6: Independent Practice

Problem 1 — t-Test Generator

Problem 2 — One-Tailed Proportion Test

Problem 3 — Proportion Test Generator

Problem 4 — Find the Error

Problem 5 — Multi-Step Synthesis: Hospital Quality Audit

Mixed Review — Retrieval from Earlier Lessons

Review Problem 1 — t-Confidence Interval (inf-3)

Review Problem 2 — z Hypothesis Test, Full Five Steps (inf-5)

Review Problem 3 — Is the t-Procedure Trustworthy? (Conditions Judgment)

Section 7: Mastery Check

Question 1 — Feynman Test

Question 2 — Apply: Placement Test

Question 3 — Error Analysis

Self-Assessment

Section 8: Boss Fight

🧪 Path A: The Health Researcher

📊 Path B: The Policy Analyst

Path A: The Health Researcher

Path B: The Policy Analyst

Section 9: Challenge Problems

Problem 1 — Paired Data: t-Test on Differences

Problem 2 — Power of the t-Test

Section 10: Solutions Reference

Quick-Reference Formulas

Student's t-Distribution Table