EN FR

REG-4: Chi-Square Test of Independence

Module 5 · Regression & Association

Section 1: Introduction

A public health team collects vaccination data from 200 adults and organizes it into a simple table:

VaccinatedNot vaccinatedRow total
Under 404555100
40 and over6535100
Col total11090200

Older adults appear more likely to be vaccinated — 65% versus 45%. But before recommending any policy, the team needs to answer a harder question: Are older adults genuinely more likely to be vaccinated in the population, or could a difference this large arise from chance in a random sample of 200?

The proportions are different. But are they statistically different? This is exactly the question the chi-square test of independence answers.

The test works by comparing what we observed to what we would expect if the two variables were completely unrelated — if vaccination status and age group were independent in the population. A large discrepancy between observed and expected counts produces a large test statistic; a small discrepancy produces a small one. The p-value then tells us whether that discrepancy is surprising enough to reject independence.

After this lesson, you will be able to:

By the end of this lesson, you will be able to:

  • Construct a contingency table and compute expected frequencies under the null hypothesis of independence.
  • Compute the chi-square test statistic and determine degrees of freedom.
  • Check the conditions for the chi-square test and explain what to do when they are violated.
  • Perform the complete five-step test, using the chi-square table to find critical values.
  • Compute Cramér’s V to quantify the strength of association.
  • Distinguish statistical association from causation.

Section 2: Prerequisites

This lesson builds directly on the five-step hypothesis-testing framework from INF-5. Make sure you have these tools ready.

The five-step framework (from INF-5):

StepWhat you do
1State H₀ and Hₐ
2Check conditions
3Compute the test statistic
4Find the p-value (or compare to critical value)
5State the decision and conclusion in context

P-value decision rule: Reject H₀ if . At , this means we need the data to be sufficiently unlikely under the assumption of independence.

Fail to reject ≠ accept H₀. A large p-value means “not enough evidence to conclude the variables are dependent” — it does not prove they are independent. We cannot prove the null hypothesis.

Type I and Type II errors:

  • Type I: Rejecting H₀ when it is actually true (false positive). Probability = .
  • Type II: Failing to reject H₀ when it is actually false (false negative). Probability = .

Success Factor:

Key Difference:

Key difference from INF-5: The chi-square test for independence is always right-tailed only. Unlike the t-test for a mean (which can be two-tailed), the chi-square statistic is always ≥ 0, and a large positive value is the only kind of extreme outcome. We only look at the right tail of the chi-square distribution.

Retrieval Warm-up — from earlier lessons

In a study of 40 university students, a researcher tests whether mean daily screen time differs from a national benchmark of 6.5 hours. She collects a sample with h and h. She has no prior hypothesis about the direction of difference. Which null and alternative hypotheses are correct?

A researcher uses INF-6 to test whether the proportion of patients who experience side effects differs from 0.20. She obtains from a sample of . She computes the test statistic as . What error has she made?

Section 3: Core Concepts

C1 — Contingency Table Structure

A contingency table (also called a cross-tabulation or two-way table) organizes data on two categorical variables. Rows represent levels of one variable; columns represent levels of the other. Cells contain observed frequencies (O) — the actual counts from the sample.

Contingency Table

A two-way table with rows and columns. Each cell contains the observed frequency O for that combination of categories. The row totals, column totals, and grand total form the margins.

Example — 3×2 contingency table structure (r = 3, c = 2):

Category ACategory BRow total
Group 1
Group 2
Group 3
Col total


C2 — Expected Frequencies

Expected Frequency

The count we would predict in a cell if the two variables were completely independent in the population. For any cell in row , column :

Common error: Some students compute — adding instead of multiplying. This is wrong. The formula uses multiplication, not addition, because independence means the joint probability equals the product of marginal probabilities.

Common error: Some students write — dividing by twice. The correct formula divides by once.

Verifying expected frequencies: The row totals and column totals of the expected frequency table must equal the row and column totals of the observed table. This is a useful self-check.



C3 — Chi-Square Test Statistic

Chi-Square Test Statistic

Summed over all cells. Measures the total discrepancy between observed and expected counts. always; equality holds only when O = E in every single cell.

Intuition: Each cell contributes to the sum — the squared deviation from what independence predicts, scaled by the expected count (larger cells shouldn’t dominate just because they’re larger). A large means the observed data deviates strongly from what independence would predict.

Common error: Using (forgetting to square the numerator) or (dividing by observed instead of expected). The denominator must be — it is the reference under H₀.

The visualization above shows how reflects the visual difference between expected and observed distributions. When the two panels look nearly identical, is small; when they differ substantially, is large.



C4 — Degrees of Freedom

Degrees of Freedom for Chi-Square Test of Independence

where = number of rows, = number of columns.

Examples:

Common error: Using (borrowed from the t-test for a mean) or . For the chi-square test of independence, the correct formula is always .

Higher df means a larger critical value is needed to reject H₀ — larger tables require more evidence of departure from independence.



C5 — Conditions for the Test

Before computing , you must verify:

  1. Random sample. The data must come from a random sample (or randomized experiment).
  2. All expected frequencies ≥ 5. Compute every value and check that none is below 5. If any , the chi-square approximation is unreliable.

Common error: Checking the observed frequencies instead of the expected frequencies. The condition is on , not . A cell can have O = 3 (small observed count) and still satisfy the condition if .

What to do if conditions are violated: If one or more expected frequencies fall below 5, adjacent categories may be combined to create fewer, larger cells. This merges the small-expected-count cell with a neighbouring one, increasing E. Report the combination as part of your analysis.



C6 — Five-Step Test for Independence

The structure is identical to the framework from INF-5:

StepAction
Step 1H₀: The two variables are independent in the population. Hₐ: They are not independent.
Step 2Check conditions: (1) random sample; (2) all expected frequencies ≥ 5. Compute all E values.
Step 3Compute . Find .
Step 4Find the p-value = right-tail area of the distribution. Equivalently, compare to the critical value .
Step 5If (or ), reject H₀. State the conclusion in context.

Note: The chi-square test is always right-tailed. We reject H₀ only for large values. There is no two-sided or left-sided chi-square test for independence.



C7 — p-Value and Critical Value

Use the Chi-Square Table to find critical values. The table gives — the value that puts area in the right tail.

Chi-Square Distribution Table

Critical values (χ²) for given degrees of freedom (df) and upper tail probability (p).

df \ p 0.9950.990.9750.950.900.100.050.0250.010.005
1 0.0000.0000.0010.0040.0162.7063.8415.0246.6357.879
2 0.0100.0200.0510.1030.2114.6055.9917.3789.21010.597
3 0.0720.1150.2160.3520.5846.2517.8159.34811.34512.838
4 0.2070.2970.4840.7111.0647.7799.48811.14313.27714.860
5 0.4120.5540.8311.1451.6109.23611.07012.83315.08616.750
6 0.6760.8721.2371.6352.20410.64512.59214.44916.81218.548
7 0.9891.2391.6902.1672.83312.01714.06716.01318.47520.278
8 1.3441.6462.1802.7333.49013.36215.50717.53520.09021.955
9 1.7352.0882.7003.3254.16814.68416.91919.02321.66623.589
10 2.1562.5583.2473.9404.86515.98718.30720.48323.20925.188
11 2.6033.0533.8164.5755.57817.27519.67521.92024.72526.757
12 3.0743.5714.4045.2266.30418.54921.02623.33726.21728.300
13 3.5654.1075.0095.8927.04219.81222.36224.73627.68829.819
14 4.0754.6605.6296.5717.79021.06423.68526.11929.14131.319
15 4.6015.2296.2627.2618.54722.30724.99627.48830.57832.801
16 5.1425.8126.9087.9629.31223.54226.29628.84532.00034.267
17 5.6976.4087.5648.67210.08524.76927.58730.19133.40935.718
18 6.2657.0158.2319.39010.86525.98928.86931.52634.80537.156
19 6.8447.6338.90710.11711.65127.20430.14432.85236.19138.582
20 7.4348.2609.59110.85112.44328.41231.41034.17037.56639.997
21 8.0348.89710.28311.59113.24029.61532.67135.47938.93241.401
22 8.6439.54210.98212.33814.04130.81333.92436.78140.28942.796
23 9.26010.19611.68913.09114.84832.00735.17238.07641.63844.181
24 9.88610.85612.40113.84815.65933.19636.41539.36442.98045.559
25 10.52011.52413.12014.61116.47334.38237.65240.64644.31446.928
26 11.16012.19813.84415.37917.29235.56338.88541.92345.64248.290
27 11.80812.87914.57316.15118.11436.74140.11343.19546.96349.645
28 12.46113.56515.30816.92818.93937.91641.33744.46148.27850.993
29 13.12114.25616.04717.70819.76839.08742.55745.72249.58852.336
30 13.78714.95316.79118.49320.59940.25643.77346.97950.89253.672
40 20.70722.16424.43326.50929.05151.80555.75859.34263.69166.766
50 27.99129.70732.35734.76437.68963.16767.50571.42076.15479.490
60 35.53437.48540.48243.18846.45974.39779.08283.29888.37991.952
70 43.27545.44248.75851.73955.32985.52790.53195.023100.425104.215
80 51.17253.54057.15360.39164.27896.578101.879106.629112.329116.321
90 59.19661.75465.64769.12673.291107.565113.145118.136124.116128.299
100 67.32870.06574.22277.92982.358118.498124.342129.561135.807140.169

Decision rule:

Common critical values at :



C8 — Conclusion Language

Standard Conclusion Statements

Reject H₀: “There is sufficient evidence at the level that [variable 1] and [variable 2] are not independent in the population.”

Fail to reject H₀: “There is insufficient evidence to conclude that [variable 1] and [variable 2] are not independent in the population.”

Critical error: After failing to reject H₀, do NOT write “the variables are independent.” Failing to reject H₀ only means we lack sufficient evidence of dependence — it does not prove independence. The correct language is “we have insufficient evidence to conclude that the variables are not independent.”

Critical error: Do NOT conclude “one variable causes the other” after rejecting H₀. Chi-square tests association, not causation. Even a highly significant result could reflect a confounding variable rather than a direct causal relationship.



C9 — Cramér’s V: Effect Size

A statistically significant does not tell you how strong the association is. With large sample sizes, even a trivially weak association can produce a significant p-value. Cramér’s V quantifies the strength independently of .

Cramér's V

Range: .

Thresholds (approximate guidelines):
VInterpretation
Negligible
Small
Medium
Large

Note on : For a table, , so . For a table, , so the formula is the same. The factor only makes a difference for tables where both and .

Common error: Computing — forgetting the square root. Cramér’s V requires the square root to keep V in .



C10 — Association vs. Causation

A statistically significant chi-square test tells you the two variables are associated in the population. It does not tell you:

A lurking variable — an unmeasured third variable correlated with both variables in your table — could explain the association entirely. This is why association studies require careful contextual interpretation and why controlled experiments are needed to establish causation.

Section 4: Worked Examples

Example 1 — Fully Worked (2×2 Test)

Context: A health researcher surveys 100 adults. Are smoking status and exercise frequency independent at ?

ExercisesDoes not exerciseRow total
Smoker153550
Non-smoker252550
Col total4060100

Step 1 — Hypotheses:

Step 2 — Expected frequencies:

All . Random sample assumed. Conditions satisfied.

Step 3 — Test statistic ():

Step 4 — p-value / critical value:

From the chi-square table: .

Since , the p-value .

Step 5 — Decision and conclusion:

Reject H₀. There is sufficient evidence at the 0.05 level that smoking status and exercise frequency are not independent in the population.

Cramér’s V:

Small effect — the association is statistically real but modest.


Example 2 — Partially Scaffolded (2×2)

Context: The vaccine/age data from the Introduction (Dataset V1):

VaccinatedNot vaccinatedRow total
Under 404555100
40 and over6535100
Col total11090200

Given expected frequencies: , , ,

Your task: compute , check conditions, and state the decision at .

Before seeing the calculations below: based on the observed and expected counts, do you expect the chi-square statistic to be larger or smaller than the critical value (3.841)? Take a moment to guess before continuing.

Show solution

All or . Conditions satisfied. .

. Since reject H₀ ().

There is sufficient evidence that age group and vaccine uptake are not independent in the population.

— small effect.


Example 3 — Cramér’s V Calculation

Given: A chi-square test yields , , , .

Before seeing the answer: what effect size do you expect — negligible, small, medium, or large?

Show solution

This falls in the medium range (). The caffeine–sleep association is moderately strong.


Example 4 — Find the Error (Conditions Violated)

Context: A researcher uses the following table (n = 50):

Outcome AOutcome BRow total
Group 1325
Group 2271845
Col total302050

The researcher computes (df = 1) and reports “p = 0.667; fail to reject H₀ — the variables appear independent.”

Identify the errors in this analysis.

Show solution

Error 1 — Conditions violated: Computing the expected frequencies:

Two cells have expected frequencies below 5. The chi-square approximation is unreliable here. The test should not be run on this table as-is. The correct approach is to combine categories or use Fisher’s Exact Test.

Error 2 — “fail to reject = independent”: Even if the test were valid, writing “the variables appear independent” after failing to reject H₀ is incorrect. The correct language is “there is insufficient evidence to conclude the variables are not independent.”

Section 5: Guided Practice

Problem 1 — Expected Frequencies (Variant Bank)

Dataset V0 — Smoking status vs. Exercise frequency (n = 100)

ExercisesDoes not exerciseRow total
Smoker153550
Non-smoker252550
Col total4060100

(a) What is the expected frequency for cell (Smoker, Exercises)?

(b) What is the expected frequency for cell (Non-smoker, Does not exercise)?

Dataset V1 — Age group vs. Vaccine uptake (n = 200)

VaccinatedNot vaccinatedRow total
Under 404555100
40 and over6535100
Col total11090200

(a) What is the expected frequency for cell (Under 40, Vaccinated)?

(b) What is the expected frequency for cell (40 and over, Not vaccinated)?

Dataset V2 — Education level vs. Daily newspaper reading (n = 200)

Reads dailyDoes not read dailyRow total
College4060100
No college3070100
Col total70130200

(a) What is the expected frequency for cell (College, Reads daily)?

(b) What is the expected frequency for cell (No college, Reads daily)?

Dataset V3 — Stress level vs. Sleep quality (n = 140)

Good sleepPoor sleepRow total
Low stress503080
High stress204060
Col total7070140

(a) What is the expected frequency for cell (Low stress, Good sleep)?

(b) What is the expected frequency for cell (High stress, Poor sleep)?

Dataset V5 — Diet type vs. BMI category (n = 100)

Normal BMIOverweightRow total
Plant-based diet351550
Standard diet252550
Col total6040100

(a) What is the expected frequency for cell (Plant-based, Normal BMI)?

(b) What is the expected frequency for cell (Standard diet, Overweight)?


Problem 2 — Conditions Check and Degrees of Freedom (Variant Bank)

Dataset V0 — Smoking/Exercise (n = 100). Expected frequencies: , , ,

(a) What is the minimum expected frequency?

(b) Are the conditions for the chi-square test met?

(c) What is df?

Small-sample table (conditions violated)
Outcome AOutcome BRow total
Group 1325
Group 2271845
Col total302050

, , , .

(a) What is the minimum expected frequency?

(b) Are the conditions met?

(c) What is df?

Dataset V6 — Gender vs. Transport preference (n = 120, 2×3)

Expected: , , ; , ,

(a) What is the minimum expected frequency?

(b) Are the conditions met?

(c) What is df?

Small 2×3 table (conditions violated)
Cat ACat BCat CRow total
Group 1812121
Group 2218929
Col total10301050

, , ; , ,

(a) What is the minimum expected frequency?

(b) Are the conditions met?

(c) What is df?

3×2 table — conditions met
Category ACategory BRow total
Group 1203050
Group 2153550
Group 3104050
Col total45105150

; all other E values are either 15.0 or 35.0.

(a) What is the minimum expected frequency?

(b) Are the conditions met?

(c) What is df?


Problem 3 — Reading the Chi-Square Table

Use the embedded chi-square table to answer each question.

Chi-Square Distribution Table

Critical values (χ²) for given degrees of freedom (df) and upper tail probability (p).

df \ p 0.9950.990.9750.950.900.100.050.0250.010.005
1 0.0000.0000.0010.0040.0162.7063.8415.0246.6357.879
2 0.0100.0200.0510.1030.2114.6055.9917.3789.21010.597
3 0.0720.1150.2160.3520.5846.2517.8159.34811.34512.838
4 0.2070.2970.4840.7111.0647.7799.48811.14313.27714.860
5 0.4120.5540.8311.1451.6109.23611.07012.83315.08616.750
6 0.6760.8721.2371.6352.20410.64512.59214.44916.81218.548
7 0.9891.2391.6902.1672.83312.01714.06716.01318.47520.278
8 1.3441.6462.1802.7333.49013.36215.50717.53520.09021.955
9 1.7352.0882.7003.3254.16814.68416.91919.02321.66623.589
10 2.1562.5583.2473.9404.86515.98718.30720.48323.20925.188
11 2.6033.0533.8164.5755.57817.27519.67521.92024.72526.757
12 3.0743.5714.4045.2266.30418.54921.02623.33726.21728.300
13 3.5654.1075.0095.8927.04219.81222.36224.73627.68829.819
14 4.0754.6605.6296.5717.79021.06423.68526.11929.14131.319
15 4.6015.2296.2627.2618.54722.30724.99627.48830.57832.801
16 5.1425.8126.9087.9629.31223.54226.29628.84532.00034.267
17 5.6976.4087.5648.67210.08524.76927.58730.19133.40935.718
18 6.2657.0158.2319.39010.86525.98928.86931.52634.80537.156
19 6.8447.6338.90710.11711.65127.20430.14432.85236.19138.582
20 7.4348.2609.59110.85112.44328.41231.41034.17037.56639.997
21 8.0348.89710.28311.59113.24029.61532.67135.47938.93241.401
22 8.6439.54210.98212.33814.04130.81333.92436.78140.28942.796
23 9.26010.19611.68913.09114.84832.00735.17238.07641.63844.181
24 9.88610.85612.40113.84815.65933.19636.41539.36442.98045.559
25 10.52011.52413.12014.61116.47334.38237.65240.64644.31446.928
26 11.16012.19813.84415.37917.29235.56338.88541.92345.64248.290
27 11.80812.87914.57316.15118.11436.74140.11343.19546.96349.645
28 12.46113.56515.30816.92818.93937.91641.33744.46148.27850.993
29 13.12114.25616.04717.70819.76839.08742.55745.72249.58852.336
30 13.78714.95316.79118.49320.59940.25643.77346.97950.89253.672
40 20.70722.16424.43326.50929.05151.80555.75859.34263.69166.766
50 27.99129.70732.35734.76437.68963.16767.50571.42076.15479.490
60 35.53437.48540.48243.18846.45974.39779.08283.29888.37991.952
70 43.27545.44248.75851.73955.32985.52790.53195.023100.425104.215
80 51.17253.54057.15360.39164.27896.578101.879106.629112.329116.321
90 59.19661.75465.64769.12673.291107.565113.145118.136124.116128.299
100 67.32870.06574.22277.92982.358118.498124.342129.561135.807140.169

(a) What is ?

(b) What is ?

(c) A chi-square test gives with . At , what is the correct decision?


Problem 4 — Full Five-Step Test (Generator)

Section 6: Independent Practice

Problem 1 — Full Analysis Chain (Variant Bank)

Dataset V0 — Smoking/Exercise (n = 100)

ExercisesDoes not exerciseRow total
Smoker153550
Non-smoker252550
Col total4060100

(a) What is E for (Non-smoker, Exercises)?

(b) Compute (to 3 decimal places).

(c) At , what is the decision?

(d) Which conclusion statement is correct?

Show full solution

E values: 20.0, 30.0, 20.0, 30.0. All ≥ 5. df = 1.

. Since , reject H₀ ().

Conclusion: There is sufficient evidence at the 0.05 level that smoking status and exercise frequency are not independent in the population.

(small effect).

Dataset V1 — Age group vs. Vaccine uptake (n = 200)

VaccinatedNot vaccinatedRow total
Under 404555100
40 and over6535100
Col total11090200

(a) What is E for (Under 40, Not vaccinated)?

(b) Compute .

(c) Decision at ?

(d) Correct conclusion?

Show full solution

E values: 55.0, 45.0, 55.0, 45.0. All ≥ 5. df = 1.

Reject H₀. There is sufficient evidence that age group and vaccine uptake are not independent in the population ().

(small effect).

Dataset V2 — Education level vs. Newspaper reading (n = 200)

Reads dailyDoes not read dailyRow total
College4060100
No college3070100
Col total70130200

(a) What is E for (College, Reads daily)?

(b) Compute .

(c) Decision at ?

(d) Correct conclusion?

Show full solution

E values: 35.0, 65.0, 35.0, 65.0. All ≥ 5. df = 1.

Fail to reject H₀ (, ).

Conclusion: There is insufficient evidence to conclude that education level and daily newspaper reading are not independent in the population.

Dataset V3 — Stress level vs. Sleep quality (n = 140)

Good sleepPoor sleepRow total
Low stress503080
High stress204060
Col total7070140

(a) What is E for (High stress, Good sleep)?

(b) Compute .

(c) Decision at ?

(d) Correct conclusion?

Show full solution

E values: 40.0, 40.0, 30.0, 30.0. All ≥ 5. df = 1.

Reject H₀ (). There is sufficient evidence that stress level and sleep quality are not independent in the population.

(small–medium borderline; classified as small).

Dataset V4 — Commute length vs. Job satisfaction (n = 200)

SatisfiedNot satisfiedRow total
Short commute5545100
Long commute4456100
Col total99101200

(a) What is E for (Short commute, Not satisfied)?

(b) Compute .

(c) Decision at ?

(d) Correct conclusion?

Show full solution

E values: 49.5, 50.5, 49.5, 50.5. All ≥ 5. df = 1.

Fail to reject H₀ (). There is insufficient evidence to conclude that commute length and job satisfaction are not independent.


Problem 2 — Full Five-Step Test on a 2×3 Table with Cramér’s V (Generator)


Problem 3 — Find the Error (Variant Bank)

A researcher runs a chi-square test on a 2×3 contingency table (2 rows, 3 columns, n = 90) and reports: “df = n − 1 = 89, χ² = 8.4. Since p < 0.05, I conclude the variables are not independent.”

What is the error in this statement?

Show Solution

Correct Answer: The correct df is , not .

Explanation: The degrees of freedom for a chi-square test of independence are calculated as where is the number of rows and is the number of columns. The formula is for a single-sample quantitative t-test, not a categorical contingency table.

A researcher computes the test statistic using this formula: , dividing by the observed count instead of the expected count.

What is the error?

Show Solution

Correct Answer: The denominator must be (expected), not (observed).

Explanation: The chi-square test measures how far the observed counts deviate from what we expect under the null hypothesis of independence. Therefore, the standardized squared deviation is scaled relative to the expected count (i.e. rac{(O-E)^2}{E}).

A researcher reports: “The chi-square test gives χ² = 1.8 (df = 1, p = 0.18). We conclude that the two variables are independent.”

What is the error?

Show Solution

Correct Answer: Failing to reject does not prove independence.

Explanation: In hypothesis testing, we never “accept” or “prove” the null hypothesis. We only fail to reject it. A non-significant p-value () indicates that we lack sufficient evidence of a relationship, not that a relationship definitely does not exist.

A researcher finds a significant association between ice cream sales and drowning rates across summer months (χ² = 12.4, df = 2, p < 0.001) and concludes: “Ice cream consumption causes drowning.”

What is the error?

Show Solution

Correct Answer: Chi-square tests association, not causation.

Explanation: A significant chi-square test only establishes that an association exists between the variables. It does not establish a causal relationship. A lurking variable (summer weather/temperature) is the common cause driving both increased ice cream sales and increased swimming/drowning rates.

A researcher runs a chi-square test on a contingency table where one cell has , and reports: “χ² = 9.7, df = 3, p < 0.05. The result is significant — the variables are not independent.”

What is the error?


Problem 4 — Cramér’s V Standalone (Generator)


Problem 5 — Synthesis: Physical Activity and Stress (Non-regenerable)

A study of 130 randomly selected workers classifies each by physical activity level (None / Moderate / Vigorous) and stress level (Low / High).

NoneModerateVigorousRow total
Low stress15302570
High stress25201560
Col total405040130

(a) State H₀ and Hₐ in context.

(b) Compute all 6 expected frequencies and verify the conditions.

(c) Compute , showing all 6 cell contributions.

(d) Identify df and compare to . State the decision at .

(e) Compute Cramér’s V and interpret the practical significance.

(f) A corporate wellness director concludes: “Since the test is significant, we should mandate vigorous exercise for all high-stress employees.” Identify two statistical reasoning errors in this statement.

Show full solution

(a) H₀: Physical activity level and stress level are independent in the worker population. Hₐ: They are not independent.

(b) Expected frequencies:

NoneModerateVigorous
Low stress21.53826.92321.538
High stress18.46223.07718.462

Min E = 18.462 ≥ 5. All conditions satisfied. Random sample stated.

(c) computation:

Cell
Low, None
Low, Moderate
Low, Vigorous
High, None
High, Moderate
High, Vigorous

(d) . .

Since Reject H₀ ().

Conclusion: There is sufficient evidence at the 0.05 level that physical activity level and stress level are not independent in the worker population.

(e)

This is a small effect. The association, while statistically real, is weak. Physical activity explains only a modest portion of the variation in stress levels.

(f) Two errors:

  1. Association ≠ causation. A significant chi-square shows that physical activity and stress are associated — it does not establish that requiring vigorous exercise will reduce stress. Other factors (workload, sleep, social support) may drive both variables.

  2. Effect size is small. indicates a weak association. Even if the causal inference were valid, the small effect size suggests that mandating exercise would produce only modest changes in stress outcomes at the population level, and likely not justify a blanket policy.


Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — Z-Test for a Mean (INF-5)

A nutritionist claims that the mean daily sodium intake of adults in a city is 2,300 mg (the recommended daily maximum). A health researcher suspects the true mean is higher. She surveys 50 adults and finds mg with known population standard deviation mg. Use .

(a) State and .

(b) Check conditions for a z-test.

(c) Compute the test statistic .

(d) The critical value for a one-tailed test at is . State the decision and write a conclusion in context.

Show Solution

(a)

(b) Conditions:

  • Randomness: Assume the 50 adults are a random (or representative) sample. ✓ (stated as a survey)
  • Independence: 50 adults is plausibly less than 10% of the city’s adult population. ✓
  • Normality: — by the Central Limit Theorem, the sampling distribution of is approximately normal even without knowing the shape of the population distribution. ✓
  • is known, so a z-test (not t-test) is appropriate. ✓

(c)

(d) Since , we reject ().

Conclusion: “There is sufficient evidence at the 0.05 significance level that the mean daily sodium intake of adults in this city exceeds 2,300 mg.”


Review Problem 2 — Proportion Test (INF-6)

A college claims that 70% of its graduates find employment in their field within six months. A skeptic surveys 120 recent graduates and finds that 75 are employed in their field. Test whether the true proportion differs from 0.70 at (two-tailed).

(a) State and .

(b) Compute and verify the success-failure condition.

(c) Compute the test statistic. Use — not — in the denominator.

(d) The two-tailed critical value is . State the decision and conclusion.

Show Solution

(a)

(b)

Success-failure condition: ✓ and

(c) The denominator uses (not ) because we are testing under the assumption that is true:

(d) Since , we fail to reject ().

Conclusion: “There is insufficient evidence at the 0.05 significance level to conclude that the true employment rate differs from 70%. The data are consistent with the college’s claim, though a 62.5% observed rate warrants continued monitoring.”

Note: Failing to reject does not prove the rate is 70% — it only means the data did not provide sufficient evidence against it at this sample size.

Section 7: Mastery Check

Feynman Prompt

A colleague says: “We ran a chi-square test and got a highly significant result (). The association between the two variables is therefore very strong.”

In 2–3 sentences, explain what is wrong with this reasoning and what they should report instead. Aim for 150–400 words.

0 words
Show reference answer

A significant p-value tells you the association is real (not due to chance in the sample) — it says nothing about the strength of the association. With large sample sizes, even a negligibly weak association produces a very small p-value. Your colleague should compute Cramér’s V: if V < 0.1, the association is negligible despite the significance; if V ≥ 0.3, it is moderate or large. Always report both p-value and V.


Apply

A researcher surveys 300 university students on preferred study location (Library / Home / Café) and year of study (First year / Upper year). A chi-square test yields with .

(a) What is the decision at ? Use the chi-square table.

(b) What is the correct conclusion?


Analyze the Error

A researcher studying the relationship between pet ownership (Yes/No) and self-reported happiness (Low/High) in a sample of 200 adults reports:

“χ² = 3.2 (df = 1, p = 0.074). Since p > 0.05, we conclude that pet owners are just as happy as non-pet-owners.”

Identify the error in the researcher’s conclusion.


Self-Assessment

How confident do you feel about the chi-square test of independence?

Still confusedReady for the Boss Fight

Section 8: Boss Fight

Choose your path. Both paths test everything from this lesson — pick the one that matches your strengths.

🔬 Path A — The Analyst

Perform a complete five-step chi-square test of independence, compute Cramér’s V, and write a professional research summary.

📝 Path B — The Communicator

Evaluate three brief research reports from different studies, identify the specific errors, and rewrite conclusions using proper statistical language.

Path A — The Analyst

A study of 120 adults measures caffeine intake level (Low / High) and sleep quality (Good / Fair / Poor).

Good sleepFair sleepPoor sleepRow total
Low caffeine30201060
High caffeine15202560
Col total454035120

Complete the full five-step chi-square test at . Then compute Cramér’s V. Finally, write a one-paragraph research summary using correct statistical language (avoid causation; state the effect size; include the decision).

Show solution

Step 1 — Hypotheses:

  • H₀: Caffeine intake and sleep quality are independent in the population.
  • Hₐ: Caffeine intake and sleep quality are not independent in the population.

Step 2 — Expected frequencies:

GoodFairPoor
Low caffeine22.520.017.5
High caffeine22.520.017.5

All E ≥ 5. Conditions satisfied. .

Step 3 — Test statistic:

Step 4 — Critical value: .

Step 5 — Decision: Since , reject H₀ ().

Cramér’s V: — medium effect.

Research summary:

“A chi-square test of independence (χ² = 11.429, df = 2, p = 0.003) found sufficient evidence that caffeine intake level and sleep quality are not independent in the adult population sampled (n = 120). The association is of medium strength (Cramér’s V = 0.309), indicating a practically meaningful — not merely statistically significant — relationship between these variables. Adults with low caffeine intake tended to report better sleep, while those with high caffeine intake were more likely to report poor sleep. As an observational study, causation cannot be established; other factors associated with caffeine intake (e.g., work stress, screen time) may contribute to the observed pattern.”

Reflection: What was the most challenging part of this bivariate chi-square analysis? Was it expected frequency calculations or Cramér’s V interpretation?

Path B — The Communicator

A research team shares three brief reports from different studies. Evaluate each one: identify the error (if any), or confirm the analysis is correct. For each flawed report, rewrite the conclusion using proper statistical language.


Report 1:

“A chi-square test on a 2×2 table (χ² = 6.3, df = 1, n = 180) gives p = 0.012. We conclude there is sufficient evidence at the 5% level that the two categorical variables are not independent in the population. Cramér’s V = 0.19 indicates a small effect.”

Evaluate Report 1

This report is correct. The test conclusion uses proper language (sufficient evidence, not independent, at the 5% level). The Cramér’s V interpretation (small effect) is appropriate for V = 0.19. No causation is claimed. No errors.


Report 2:

“A chi-square test yields χ² = 4.2 (df = 2, p = 0.12). Since p > 0.05, we confirm that education level and political affiliation are independent.”

Evaluate Report 2

Error: “We confirm … are independent” after failing to reject H₀. Failing to reject H₀ does not prove the null hypothesis — it only means insufficient evidence of dependence.

Corrected conclusion: “There is insufficient evidence at the 5% level to conclude that education level and political affiliation are not independent in the population. The test does not confirm independence.”


Report 3:

“Our chi-square analysis shows that ice cream consumption and drowning rates are significantly associated (χ² = 18.7, df = 1, p < 0.001, V = 0.43). Therefore, eating ice cream increases the risk of drowning.”

Evaluate Report 3

Error: Claiming causation from a significant association. Chi-square tests statistical dependence between two variables in a sample — it cannot establish causal direction or rule out confounders. A lurking variable (summer heat) drives both ice cream sales and swimming activity, producing the association.

Corrected conclusion: “There is sufficient evidence at the 0.1% level that ice cream consumption and drowning rates are not independent (χ² = 18.7, df = 1, V = 0.43, medium effect). The association is likely due to a common cause (warm weather driving both behaviours) rather than any direct link between ice cream and drowning.”

Reflection: What was the most challenging part of identifying these errors? Which report was easiest to critique?

Section 9: Challenge Problems

Challenge 1 — Sample Size Effect on χ² vs. Cramér’s V

Dataset V0 (smoking/exercise, n = 100) has the following observed proportions: among smokers, 30% exercise; among non-smokers, 50% exercise (an underlying 20-percentage-point gap).

Fix these proportions and scale the sample size. Complete the table.

nObserved countsχ²Decision ()V
100[[15,35],[25,25]]4.167Reject H₀0.204
200[[30,70],[50,50]]???
400[[60,140],[100,100]]???
800[[120,280],[200,200]]???

After completing the table, answer: What does this tell us about using χ² alone as a measure of association strength?

Show solution

n = 200 (proportions unchanged, counts doubled):

E values: each cell E doubles too. E₁₁ = 40, E₁₂ = 60, E₂₁ = 40, E₂₂ = 60.

(Exactly ✓). → Reject H₀. (unchanged).

n = 400:

. Reject H₀. .

n = 800:

. Reject H₀. .

Completed table:

nχ²DecisionV
1004.167Reject H₀0.204
2008.333Reject H₀0.204
40016.667Reject H₀0.204
80033.333Reject H₀0.204

Interpretation: χ² scales linearly with n — quadrupling the sample quadruples the test statistic. With a large enough n, even a negligibly weak association will produce a statistically significant χ². Cramér’s V, by contrast, stays constant at 0.204 regardless of n — it measures the true strength of the association, independent of sample size. Always report V alongside χ² to communicate both significance and practical importance.


Challenge 2 — 3×3 Table

A sociologist examines the relationship between preferred car type (Economy / Standard / Luxury) and income bracket (Low / Medium / High) for 300 randomly selected adults.

Low incomeMedium incomeHigh incomeRow total
Economy503020100
Standard304030100
Luxury203050100
Col total100100100300

(a) State H₀ and Hₐ.

(b) Compute all 9 expected frequencies and check conditions.

(c) Find df. How does the df formula change compared to a 2×2 table?

(d) Compute χ² (all 9 cell contributions).

(e) Decide at using the chi-square table.

(f) Compute Cramér’s V and interpret.

Show solution

(a) H₀: Car type preference and income bracket are independent in the population. Hₐ: They are not independent.

(b) With equal margins (all row and column totals = 100), every expected frequency is: . All 9 cells have E = 33.333 ≥ 5. Conditions satisfied.

(c) . For a 2×2 table, df = 1. For a 3×3 table, df = 4 — because there are more independent cell comparisons. A larger df requires a larger critical value to reject H₀.

(d) Each cell contributes :

CellO(O − 33.333)²/33.333
Economy, Low508.333
Economy, Medium300.333
Economy, High205.333
Standard, Low300.333
Standard, Medium401.333
Standard, High300.333
Luxury, Low205.333
Luxury, Medium300.333
Luxury, High508.333

(e) . Since , reject H₀ strongly.

There is very strong evidence that car preference and income bracket are not independent in the population.

(f)

— small to medium effect.

Despite the extremely significant p-value, the association is only small in practical terms. Knowing income bracket provides only limited ability to predict car type preference.


Challenge 3 — Simpson’s Paradox

A hospital compares recovery rates for two treatments (A vs. B). The overall combined data appears to show Treatment A is superior:

Combined data (n = 200):

RecoveredNot recoveredTotal
Treatment A7822100
Treatment B7327100

(a) Based on the combined table, which treatment appears better?

After stratifying by condition severity:

Mild cases (n = 130):

RecoveredNot recoveredTotal
Treatment A72880
Treatment B45550

Severe cases (n = 70):

RecoveredNot recoveredTotal
Treatment A61420
Treatment B282250

(b) For mild cases, which treatment is better (or are they tied)?

(c) For severe cases, which treatment is better?

(d) Why does Treatment A appear better overall even though it is no better (or worse) for each severity group? What lurking variable drives the reversal?

(e) If a hospital administrator used only the combined table chi-square result to recommend a treatment, what error would they be making?

(f) What general lesson does this illustrate about chi-square tests and association analyses?

Show solution

(a) Treatment A: 78/100 = 78%. Treatment B: 73/100 = 73%. Treatment A appears better in the combined data.

(b) Mild cases: A = 72/80 = 90.0%; B = 45/50 = 90.0%. Exactly tied.

(c) Severe cases: A = 6/20 = 30.0%; B = 28/50 = 56.0%. Treatment B is better.

(d) Treatment A was assigned mostly to mild cases (80 of its 100 patients), which naturally have higher recovery rates regardless of treatment. Treatment B handled more severe cases (50 of its 100 patients). The overall rate for Treatment A is inflated by its patient mix. Disease severity is a lurking variable (confounder) — it is associated with both treatment assignment and recovery outcome, creating the illusion that A outperforms B.

(e) The administrator would recommend Treatment A based on a confounded combined analysis. The correct conclusion — that B is equally effective for mild cases and superior for severe cases — is only visible after stratifying by severity.

(f) This is Simpson’s Paradox: an association present in the combined data can reverse or disappear after stratifying by a third variable. A chi-square test on a combined table may mask patterns present within subgroups. Whenever a lurking variable might confound the association, the data should be stratified by that variable. Statistical analysis must always account for the study design and potential confounders.

Section 10: Solutions Reference

Full worked solutions for all problems in this lesson (Sections 5–9) are available on the dedicated solutions page. Solutions include every computation step, formula derivation, and interpretation note.

View all solutions →


Quick-Reference Formulas

FormulaPurpose
Expected frequency under H₀ (independence)
Chi-square test statistic; always ≥ 0
Degrees of freedom; = rows, = columns
Cramér’s V effect size; range [0, 1]

Key Interpretation Rules

Common Pitfalls

PitfallWhat goes wrongCorrection
P1 — Conditions skippedTest run without checking Compute all expected frequencies first; combine categories if needed
P2 — O and E swappedWriting or in the numeratorDenominator is always ; numerator is
P3 — Wrong dfUsing or Always
P4 — Causation error”Variable A causes Variable B” after a significant resultChi-square shows association, not causation
P5 — Independence claimed”The variables are independent” after failing to rejectFailing to reject H₀ is not proof of independence