Solutions — Linear Regression

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson REG-2

Section 5: Guided Practice Solutions

▾

Problem 1 — Computing b, a, and the Point of Means (Variants 0–4)

For all variants: , then . The point of means always lies on the line.

Variant 0 (study vs. score; , , , , ): ; . Check: ✓.
Variant 1 (temp vs. sales; , , , , ): ; . Check: ✓.
Variant 2 (exercise vs. HR; , , , , ): ; . Check: ✓.
Variant 3 (fertilizer vs. yield; , , , , ): ; . Check: ✓.
Variant 4 (age vs. reaction; , , , , ): ; . Check: ✓.

Common mistakes: (1) inverting the ratio — goes in the numerator; (2) wrong sign on the correction (when , ); (3) forgetting to subtract from .

Problem 2 — Interpreting Slope and Intercept (Variants 0–4)

Variant 0 (): slope — “each extra hour of study raises the predicted score 3.70 points, on average”; intercept 56.90 = predicted score for 0 hours (borderline meaningful).
Variant 1 (): slope — sales fall 2.16 units per °C, on average; intercept 123.2 = predicted sales at 0°C (realistic winter value, meaningful).
Variant 2 (): slope — HR drops 0.53 bpm per minute of exercise; intercept 87.9 = predicted HR for a sedentary person (meaningful).
Variant 3 (): slope — yield rises 0.45 kg per gram; intercept 1.60 = predicted yield with no fertilizer (realistic control plot).
Variant 4 (, ages 20–65): slope — reaction time rises 1.69 ms/year; intercept 177.4 is not meaningful — (newborn) is far outside the data; it’s a mathematical anchor only.

Problem 3 — Residual Scenarios

Scenario 1 (; studies 4 h, scores 73): , (above the line; underpredicted).
Scenario 2 (; at 25°C, sales 68): , (below the line; overpredicted).
Scenario 3 (; exercises 60 min, HR 56): , (essentially on the line).

Problem 4 — Parameterized Generator (Representative Example)

, , , , ; predict at . ; ; . Check: ✓.

Section 6: Independent Practice Solutions

▾

Problem 1 — Full Chain: Slope, Intercept, Prediction, Residual (Variants 0–4)

Variant 0 (study→score; , ): , ; ; (underpredicted).
Variant 1 (temp→sales; , ): , ; ; .
Variant 2 (exercise→HR; , ): , ; ; (HR higher than expected).
Variant 3 (fertilizer→yield; , ): , ; ; .
Variant 4 (age→reaction; , ): , ; ; (slower than expected).

Problem 2 — Regression Interpretation Generator (Representative Example)

, study hours → score; , . Correct slope: “each extra hour of study raises the predicted score 3.70 points, on average.” Wrong options: dropping “on average”; swapping and ; using causal language (“causes”).

Problem 3 — Find the Error (Variants 0–4)

Variant 0 (missing “on average”): the slope is an average predicted change, not a guarantee for an individual.
Variant 1 (inverted ratio ): that’s the slope of on . Correct: .
Variant 2 (intercept extrapolation to a newborn): is far outside the 20–65 range; the intercept is a mathematical anchor, not a newborn prediction.
Variant 3 (using -on- equation in reverse): you must refit with the roles swapped; the new slope is , not .
Variant 4 (intercept = “fertilizer explains 1.60 kg”): the intercept is the predicted yield at ; the effect per gram is the slope (0.45 kg/g).

Problem 4 — Residual Generator (Representative Example)

; observed , . , — above the line; the student beat the average trend for 4-hour studiers.

Problem 5 — Multi-Step Synthesis: Rehabilitation (n = 10)

Sums: , , , , ; , .

(a) Numerator ; Left ; Right ; .

(b) With , : ; . Equation: . Check: ✓.

(c) Slope: “each extra week of rehab is associated with a predicted ~5.95-point mobility gain, on average.”

(d) Intercept (≈ zero improvement with no therapy) — intuitive, but minimum observed is 2 weeks, so slightly extrapolates.

(e) At , : , (overpredicted).

(f) At : — mild extrapolation (max observed ); flag it, as the trend may plateau.

Section 7: Mastery Check Solutions

▾

Problem 1 — Feynman Test

The slope is how much the predicted response changes per 1-unit increase in — and the mandatory phrase is “on average.” It describes the average predicted change across all individuals at a given , not a guarantee for any one person. It is not a causal statement, and it should not be applied outside the observed data range.

Problem 2 — Apply: Study Hours Regression

(a) .

(b) Student scores 74: — below the line (overpredicted by 5.10). Normal scatter: this student is below the average trend for 6-hour studiers.

Problem 3 — Error Analysis

Error 1 — inverted ratio: the researcher used . Correct: (1.25 is the -on- slope).

Error 2 — missing “on average”: “yield increases by 1.25 kg” omits the required phrase.

Corrected: ”. For each extra gram of fertilizer, the predicted yield rises 0.45 kg, on average.”

Section 8: Boss Fight Solutions

▾

Path A — The Calculator: Factory Training Data

Sums (): , , , , .

Task 1 — Conditions: both quantitative; consistent increase (linear plausible); no single drastically influential point (). Proceed.

Task 2 — r, b, a: , . Numerator ; Left , Right ; . With , : ; . Equation: . Check: ✓.

Task 3 — At , : , (above the line).

Task 4 — Interpretation: slope ≈ 4.10 units/week of training, on average; intercept ≈ 8.46 = predicted output for an untrained worker (borderline — is at the data edge). The equation can’t establish that training causes the gain (aptitude could confound), nor that the trend continues past 7 weeks.

Path B — The Interpreter: Factory Equation Given

(weeks → units; observed 1–7 weeks, , ).

Task 1 — Interpretation: slope 4.56 = predicted units gained per week, on average; intercept 6.62 = predicted output for an untrained worker (borderline, at the edge — treat with caution).

Task 2 — Extrapolation at : — beyond the observed range (max 7 weeks); gains may level off, so flag it as an uncertain extrapolation.

Task 3 — At , : , (above the line — aptitude, motivation, or random variation).

Task 4 — Outlier (, ): a high-leverage influential point — it would pull the right end up, raising the slope ( and both grow, amplifying ). The line still passes through the point of means, but themselves shift upward, so the new line passes through a different point of means. This is why checking the scatter for influential outliers (C9) is required.

Section 9: Challenge Problem Solutions

▾

Challenge 1 — Regression Asymmetry

, , , , .

(a) on : ; → .

(b) on : ; → .

(c) Is ? → no. But (always true).

(d) requires , i.e. (). Only in a perfect linear association do the two regression lines coincide; for they genuinely differ (minimizing vertical vs. horizontal errors).

Challenge 2 — Sensitivity of b to r

With , , the ratio for all rows.

		Notes
0.4	1.0
0.8	2.0	is linear in
−0.4	−1.0	negative → negative
1.0	2.5	max slope ; points lie exactly on the line

(a) → : horizontal line — gives no linear information; best prediction is .

(b) → : all points lie on the line. As rises 0 → 1, the line tilts from flat toward its max slope , controlled entirely by .

Challenge 3 — Regression to the Mean

Two exams, , both , .

(a) Since : ; → .

(b) Scored 90 on Exam 1: — below 90, closer to the mean.

(c) With , , so → predicted . The factor shrinks the deviation toward the mean, because part of an extreme score is random error that doesn’t repeat.

(d) The coach is wrong — this is regression to the mean, not “training fatigue.” Top week-1 performers tend to score closer to average in week 2 regardless of any intervention, because part of their extreme score was luck. The fallacy underlies false beliefs that praise worsens and punishment improves performance; always consider regression to the mean before crediting an intervention.

← Return to Lesson REG-2

REG-2: Solutions — Linear Regression

Section 5: Guided Practice Solutions

Problem 1 — Computing b, a, and the Point of Means (Variants 0–4)

Problem 2 — Interpreting Slope and Intercept (Variants 0–4)

Problem 3 — Residual Scenarios

Problem 4 — Parameterized Generator (Representative Example)

Section 6: Independent Practice Solutions

Problem 1 — Full Chain: Slope, Intercept, Prediction, Residual (Variants 0–4)

Problem 2 — Regression Interpretation Generator (Representative Example)

Problem 3 — Find the Error (Variants 0–4)

Problem 4 — Residual Generator (Representative Example)

Problem 5 — Multi-Step Synthesis: Rehabilitation (n = 10)

Section 7: Mastery Check Solutions

Problem 1 — Feynman Test

Problem 2 — Apply: Study Hours Regression

Problem 3 — Error Analysis

Section 8: Boss Fight Solutions

Path A — The Calculator: Factory Training Data

Path B — The Interpreter: Factory Equation Given

Section 9: Challenge Problem Solutions

Challenge 1 — Regression Asymmetry

Challenge 2 — Sensitivity of b to r

Challenge 3 — Regression to the Mean