EN FR

REG-2 Solutions: Linear Regression

Solutions Reference · ← Back to Lesson REG-2

Section 5 — Guided Practice Solutions

GP-1 — Computing \( b \), \( a \), and Identifying the Point of Means (Variants 0–4)

For all variants: \( b = r \times s_y / s_x \), then \( a = \bar{y} - b\bar{x} \). The point of means \( (\bar{x}, \bar{y}) \) always lies on the line.

Variant 0 — Study hours vs. Exam score: \( r = 0.86 \), \( s_x = 2.0 \), \( s_y = 8.6 \), \( \bar{x} = 3.0 \), \( \bar{y} = 68.0 \)

Variant 1 — Temperature (°C) vs. Hot beverage sales: \( r = -0.72 \), \( s_x = 5.0 \), \( s_y = 15.0 \), \( \bar{x} = 20.0 \), \( \bar{y} = 80.0 \)

Variant 2 — Daily exercise (min) vs. Resting heart rate (bpm): \( r = -0.88 \), \( s_x = 10.0 \), \( s_y = 6.0 \), \( \bar{x} = 30.0 \), \( \bar{y} = 72.0 \)

Variant 3 — Fertilizer applied (g) vs. Tomato yield (kg): \( r = 0.75 \), \( s_x = 4.0 \), \( s_y = 2.4 \), \( \bar{x} = 8.0 \), \( \bar{y} = 5.2 \)

Variant 4 — Age (years) vs. Reaction time (ms): \( r = 0.81 \), \( s_x = 12.0 \), \( s_y = 25.0 \), \( \bar{x} = 40.0 \), \( \bar{y} = 245.0 \)

Common mistakes in GP-1: (1) Inverting the ratio — always \( s_y \) in the numerator: \( b = r \times s_y / s_x \). (2) Wrong sign on the correction: when \( b \) is negative, \( -b\bar{x} \) is positive. (3) Forgetting to subtract \( b\bar{x} \) from \( \bar{y} \).


GP-2 — Interpreting Slope and Intercept (Variants 0–4)

Variant 0 — \( \hat{y} = 56.90 + 3.70x \) (study hours → exam score)

Variant 1 — \( \hat{y} = 123.2 - 2.16x \) (temperature °C → hot beverage sales)

Variant 2 — \( \hat{y} = 87.9 - 0.53x \) (daily exercise min → resting heart rate bpm)

Variant 3 — \( \hat{y} = 1.60 + 0.45x \) (fertilizer g → tomato yield kg)

Variant 4 — \( \hat{y} = 177.4 + 1.69x \) (age years, observed 20–65 → reaction time ms)


GP-3 — Residual Scenarios (Non-regenerable)

Scenario 1: \( \hat{y} = 56.90 + 3.70x \). A student studies 4 hours and scores 73.

\[ \hat{y}(4) = 56.90 + 3.70 \times 4 = 56.90 + 14.80 = 71.70 \]

\[ e = y - \hat{y} = 73 - 71.70 = +1.30 \]

A positive residual means the student's actual score (73) is above the regression line — the model underpredicted by 1.30 points.


Scenario 2: \( \hat{y} = 123.2 - 2.16x \). At 25°C, actual sales were 68 units.

\[ \hat{y}(25) = 123.2 - 2.16 \times 25 = 123.2 - 54.0 = 69.2 \]

\[ e = 68 - 69.2 = -1.2 \]

A negative residual means actual sales (68) fell below the predicted value (69.2) — the model overpredicted by 1.2 units.


Scenario 3: \( \hat{y} = 87.9 - 0.53x \). A person exercises 60 minutes and has resting HR 56 bpm.

\[ \hat{y}(60) = 87.9 - 0.53 \times 60 = 87.9 - 31.8 = 56.1 \]

\[ e = 56 - 56.1 = -0.1 \]

A tiny negative residual: this person's heart rate (56 bpm) is just barely below the line (56.1 bpm). The model overpredicted by 0.1 bpm — essentially exactly on the line.


GP-4 — Parameterized Generator (Representative Example)

Representative example — your generator will show different values.

A sports analyst finds \( r = 0.86 \), \( s_x = 2.0 \), \( s_y = 8.6 \), \( \bar{x} = 3.0 \), \( \bar{y} = 68.0 \). Target \( x = 5 \).

Step 1 — Compute \( b \):

\[ b = r \times \frac{s_y}{s_x} = 0.86 \times \frac{8.6}{2.0} = 0.86 \times 4.3 = 3.698 \approx 3.70 \]

Step 2 — Compute \( a \):

\[ a = \bar{y} - b\bar{x} = 68.0 - 3.70 \times 3.0 = 68.0 - 11.10 = 56.90 \]

Step 3 — Predict \( \hat{y} \) at \( x = 5 \):

\[ \hat{y}(5) = 56.90 + 3.70 \times 5 = 56.90 + 18.50 = 75.40 \]

Verify via point of means: \( \hat{y}(3.0) = 56.90 + 3.70 \times 3.0 = 68.0 = \bar{y} \) ✓

Section 6 — Independent Practice Solutions

IP-1 — Full Chain: Slope, Intercept, Prediction, Residual (Variants 0–4)

Variant 0 — Study hours → Exam score: \( r = 0.86 \), \( s_x = 2.0 \), \( s_y = 8.6 \), \( \bar{x} = 3.0 \), \( \bar{y} = 68.0 \). Observed: \( x = 5 \) hours, \( y = 78 \).

Variant 1 — Temperature (°C) → Hot beverage sales: \( r = -0.72 \), \( s_x = 5.0 \), \( s_y = 15.0 \), \( \bar{x} = 20.0 \), \( \bar{y} = 80.0 \). Observed: \( x = 28 \)°C, \( y = 62 \) units.

Variant 2 — Daily exercise (min) → Resting heart rate (bpm): \( r = -0.88 \), \( s_x = 10.0 \), \( s_y = 6.0 \), \( \bar{x} = 30.0 \), \( \bar{y} = 72.0 \). Observed: \( x = 45 \) min, \( y = 68 \) bpm.

Variant 3 — Fertilizer (g) → Tomato yield (kg): \( r = 0.75 \), \( s_x = 4.0 \), \( s_y = 2.4 \), \( \bar{x} = 8.0 \), \( \bar{y} = 5.2 \). Observed: \( x = 12 \) g, \( y = 6.8 \) kg.

Variant 4 — Age (years) → Reaction time (ms): \( r = 0.81 \), \( s_x = 12.0 \), \( s_y = 25.0 \), \( \bar{x} = 40.0 \), \( \bar{y} = 245.0 \). Observed: \( x = 55 \) years, \( y = 290 \) ms.


IP-2 — Regression Interpretation Generator (Representative Example)

Representative example — your generator will show different values.

Context: \( r = 0.86 \), \( s_x = 2.0 \), \( s_y = 8.6 \), \( \bar{x} = 3.0 \), \( \bar{y} = 68.0 \), where \( x \) = study hours and \( y \) = exam score. Target \( x = 5 \).

\[ b = 0.86 \times 8.6 / 2.0 = 3.70 \]

\[ \hat{y}(5) = 56.90 + 3.70 \times 5 = 75.40 \]

Correct slope interpretation: "For each additional hour of study, the predicted exam score increases by 3.70 points, on average."

Why the other options are wrong:


IP-3 — Find the Error (Variants 0–4)

Variant 0 — Error: Missing "on average"

The statement "If a student studies 1 more hour, their score will increase by 3.70 points" drops the phrase "on average." The slope describes the average predicted change across all students — not a guarantee for any individual.

Corrected statement: "For each additional hour of study, the predicted exam score increases by 3.70 points, on average."

Variant 1 — Error: Inverted ratio (\( b = r \times s_x / s_y \))

The researcher wrote \( b = -0.72 \times 5.0 / 15.0 = -0.24 \). This is the slope of \( x \) on \( y \), not \( y \) on \( x \). The correct formula puts \( s_y \) in the numerator.

Correct calculation: \( b = -0.72 \times 15.0 / 5.0 = -2.16 \)

Variant 2 — Error: Intercept extrapolation beyond data range

The statement "At birth (age 0), the predicted reaction time is 177.4 ms" uses the intercept to make a prediction at \( x = 0 \), which is far outside the observed data range of 20–65 years. The model was built on adult data; applying it to a newborn is extrapolation.

Correct statement: "The intercept 177.4 ms is a mathematical anchor. Since \( x = 0 \) (a newborn) is far outside the observed range of 20–65 years, this value should not be interpreted as a meaningful prediction."

Variant 3 — Error: Using original equation in reverse (regression asymmetry)

The researcher uses the equation \( \hat{y} = 87.9 - 0.53x \) (exercise → heart rate) to predict exercise minutes from heart rate by plugging in heart rate as \( x \). This is wrong: the original equation minimizes errors in predicting heart rate (\( y \)), not exercise minutes. A new regression must be computed with heart rate as the predictor and exercise as the response. The new slope would be \( b' = r \times s_x / s_y \), not \( 1/b \).

Variant 4 — Error: Misidentifying the intercept

The statement "The intercept 1.60 means that fertilizer explains 1.60 kg of yield" confuses the intercept with a measure of fertilizer's effect. The intercept is the predicted yield when \( x = 0 \) (no fertilizer). The amount of yield change per unit of fertilizer is the slope (\( b = 0.45 \) kg/g).

Corrected statement: "The intercept 1.60 is the predicted tomato yield when no fertilizer is applied (\( x = 0 \)) — a baseline prediction, not a measure of fertilizer's effect."


IP-4 — Residual Generator (Representative Example)

Representative example — your generator will show different values.

Equation: \( \hat{y} = 56.90 + 3.70x \), where \( x \) = study hours and \( y \) = exam score. Observed: \( x_{obs} = 4 \), \( y_{obs} = 73 \).

\[ \hat{y}(4) = 56.90 + 3.70 \times 4 = 56.90 + 14.80 = 71.70 \]

\[ e = y_{obs} - \hat{y} = 73 - 71.70 = +1.30 \]

The residual is positive (+1.30), so the actual score (73) is above the regression line — the model underpredicted. This student outperformed the average trend for students who studied 4 hours.


IP-5 — Multi-Step Synthesis: Physical Therapy Rehabilitation (All 6 Parts)

Dataset: weeks of rehabilitation (\( x \)) and mobility improvement score (\( y \), scale 0–100) for 10 patients.

Pre-computed sums: \( n = 10 \), \( \Sigma x = 54 \), \( \Sigma y = 320 \), \( \Sigma xy = 1940 \), \( \Sigma x^2 = 324 \), \( \Sigma y^2 = 11390 \).

Derived means: \( \bar{x} = 54/10 = 5.4 \), \( \bar{y} = 320/10 = 32.0 \).

(a) Computing \( r \):

\[ r = \frac{n\Sigma xy - \Sigma x \Sigma y}{\sqrt{(n\Sigma x^2 - (\Sigma x)^2)(n\Sigma y^2 - (\Sigma y)^2)}} \]

Numerator: \( 10 \times 1940 - 54 \times 320 = 19400 - 17280 = 2120 \)

Left bracket: \( 10 \times 324 - 54^2 = 3240 - 2916 = 324 \)

Right bracket: \( 10 \times 11390 - 320^2 = 113900 - 102400 = 11500 \)

\[ r = \frac{2120}{\sqrt{324 \times 11500}} = \frac{2120}{\sqrt{3{,}726{,}000}} \approx \frac{2120}{1930.3} \approx 0.998 \]

Very strong positive linear association between weeks of rehabilitation and mobility improvement.

(b) Computing \( b \) and \( a \):

Using \( s_x \approx 1.897 \) and \( s_y \approx 11.304 \):

\[ b = r \cdot \frac{s_y}{s_x} = 0.998 \times \frac{11.304}{1.897} \approx 0.998 \times 5.958 \approx 5.95 \]

\[ a = \bar{y} - b\bar{x} = 32.0 - 5.95 \times 5.4 = 32.0 - 32.13 \approx -0.13 \]

Equation: \( \hat{y} \approx -0.13 + 5.95x \)

Verification via point of means: \( \hat{y}(5.4) = -0.13 + 5.95 \times 5.4 = -0.13 + 32.13 = 32.00 = \bar{y} \) ✓

(c) Slope interpretation:

"Each additional week of rehabilitation is associated with a predicted mobility improvement of approximately 5.95 points, on average."

(d) Intercept meaningfulness:

\( x = 0 \) represents a patient with zero weeks of rehabilitation. The predicted value \( \hat{y}(0) \approx -0.13 \approx 0 \) is intuitive — essentially zero improvement with no therapy. However, the minimum observed \( x \) in the dataset is 2 weeks, so \( x = 0 \) slightly extrapolates outside the observed range. The near-zero intercept is consistent with the model but should be noted as technically just beyond the data range.

(e) Residual for patient at \( x = 6 \), \( y = 34 \):

\[ \hat{y}(6) = -0.13 + 5.95 \times 6 = -0.13 + 35.70 = 35.57 \]

\[ e = 34 - 35.57 = -1.57 \]

The negative residual means this patient improved 1.57 points less than the model predicted — their actual score falls below the regression line. The model overpredicted for this patient.

(f) Prediction at \( x = 10 \) weeks:

\[ \hat{y}(10) = -0.13 + 5.95 \times 10 = -0.13 + 59.50 = 59.37 \approx 59.4 \text{ points} \]

Caution: The maximum observed \( x \) in the dataset is 9 weeks. Predicting at \( x = 10 \) weeks is mild extrapolation beyond the observed range. The linear trend may not continue — mobility improvement could plateau or slow after 9 weeks. Use this prediction with caution and flag it explicitly as an extrapolation.

Section 7 — Mastery Check Solutions

Feynman Test — Model Answer

The slope \( b \) tells you how much the predicted response variable (\( y \)) changes for each 1-unit increase in the predictor variable (\( x \)). The critical phrase that must always appear is "on average."

The slope describes the average predicted change across all individuals with a given \( x \) value — it does not guarantee what any specific individual will do. A student who studies one more hour might score 3.70 points higher, lower, or exactly as predicted. The slope pins down the average, not the individual outcome. The slope is also not a causal statement and should not be applied outside the observed data range.


Question 2 — Apply: Study Hours Regression

Regression equation from GP-1 Variant 0: \( \hat{y} = 56.90 + 3.70x \) (study hours → exam score).

Part A: A student studies 6 hours.

\[ \hat{y}(6) = 56.90 + 3.70 \times 6 = 56.90 + 22.20 = 79.10 \]

Part B: The student actually scores 74.

\[ e = y - \hat{y} = 74 - 79.10 = -5.10 \]

The residual is \( -5.10 \). A negative residual means the actual score (74) is below the predicted score (79.10) — this student scored below the regression line. The model overpredicted for this student by 5.10 points. This is normal — individuals scatter around the line; the negative residual indicates this student performed below the average trend for 6-hour studiers.


Question 3 — Error Analysis

The researcher's report contained two errors:

Error 1 — Inverted ratio (Pitfall P1): The researcher wrote \( b = 0.75 \times s_x / s_y = 0.75 \times 4.0 / 2.4 = 1.25 \). The correct formula is \( b = r \times s_y / s_x \), with \( s_y \) in the numerator (not \( s_x \)).

Correct slope: \( b = 0.75 \times 2.4 / 4.0 = 1.8 / 4.0 = 0.45 \)

The researcher's answer of 1.25 is actually the slope of \( x \) on \( y \) — a different line with a different meaning.

Error 2 — Missing "on average" (Pitfall P3): The statement "tomato yield increases by 1.25 kg" omits the mandatory phrase "on average." The slope describes the average predicted change across all plants with a given fertilizer amount — individual plants will deviate from this prediction.

Corrected report: "The slope is \( b = 0.75 \times 2.4 / 4.0 = 0.45 \). For each additional gram of fertilizer applied, the predicted tomato yield increases by 0.45 kg, on average."

Section 8 — Boss Fight Solutions

Path A — The Calculator: Factory Training Data

Dataset: weeks of training (\( x \)) and units produced per shift (\( y \)) for 7 workers. Pre-computed: \( \Sigma x = 28 \), \( \Sigma y = 174 \), \( \Sigma xy = 822 \), \( \Sigma x^2 = 140 \), \( \Sigma y^2 = 5060 \).

Task 1 — Checking conditions:

All three conditions are met. Both variables (weeks, units) are quantitative. The data shows a consistent increase from 10 to 40 units as training weeks increase — a linear trend is plausible from the scatter plot. With only 7 observations, no single point appears drastically influential or out of line with the general trend. We proceed with regression.

Task 2 — Computing \( r \), \( b \), and \( a \):

\[ \bar{x} = 28/7 = 4.0, \qquad \bar{y} = 174/7 \approx 24.857 \]

Numerator for \( r \): \( 7 \times 822 - 28 \times 174 = 5754 - 4872 = 882 \)

Left bracket: \( 7 \times 140 - 28^2 = 980 - 784 = 196 \)

Right bracket: \( 7 \times 5060 - 174^2 = 35420 - 30276 = 5144 \)

\[ r = \frac{882}{\sqrt{196 \times 5144}} = \frac{882}{\sqrt{1{,}008{,}224}} \approx \frac{882}{1004.1} \approx 0.879 \]

Using \( s_x \approx 2.160 \) and \( s_y \approx 10.07 \):

\[ b = r \cdot \frac{s_y}{s_x} = 0.879 \times \frac{10.07}{2.160} \approx 0.879 \times 4.662 \approx 4.10 \]

\[ a = \bar{y} - b\bar{x} = 24.857 - 4.10 \times 4.0 = 24.857 - 16.4 \approx 8.46 \]

Equation: \( \hat{y} \approx 8.46 + 4.10x \)

Verification: \( \hat{y}(4.0) = 8.46 + 4.10 \times 4.0 = 8.46 + 16.4 = 24.86 \approx \bar{y} \) ✓

Task 3 — Prediction and residual for \( x = 5 \) weeks, \( y = 30 \) units:

\[ \hat{y}(5) = 8.46 + 4.10 \times 5 = 8.46 + 20.50 = 28.96 \text{ units} \]

\[ e = 30 - 28.96 = +1.04 \]

This worker produced 1.04 units more than the regression model predicted for someone with 5 weeks of training. Their actual output is above the line — they outperformed the average trend for workers at that experience level.

Task 4 — Interpreting slope and intercept:

Slope: For each additional week of on-the-job training, the predicted units produced per shift increases by approximately 4.10 units, on average. The positive slope confirms that additional training is associated with higher productivity.

Intercept: The intercept \( a \approx 8.46 \) is the predicted units produced for a worker with \( x = 0 \) weeks of training (a brand-new, untrained worker). Since \( x = 0 \) is at the boundary of the observed data range (minimum observed was 1 week), the intercept has borderline contextual meaning: it suggests untrained workers produce roughly 8–9 units per shift.

Limitation: The equation tells the supervisor how productivity increases with training weeks on average — but it cannot tell them whether training itself causes the improvement (there may be confounders, such as worker aptitude) or whether the linear trend continues beyond 7 weeks.


Path B — The Interpreter: Factory Equation Given

Given equation: \( \hat{y} = 6.62 + 4.56x \), where \( x \) = weeks of training, \( y \) = units per shift. Observed range: \( x \) from 1 to 7 weeks, \( \bar{x} = 4.0 \), \( \bar{y} = 24.86 \).

Task 1 — Slope and intercept interpretation:

Slope (4.56): For each additional week of on-the-job training, the predicted number of units produced per shift increases by 4.56 units, on average. This indicates a meaningful productivity gain associated with each additional week of experience.

Intercept (6.62): The intercept is the predicted units produced for a brand-new worker with zero weeks of training. Since \( x = 0 \) is at the boundary of the observed range (minimum observed \( x = 1 \) week), it has borderline contextual meaning. A reasonable interpretation: workers with no formal training are predicted to produce approximately 6–7 units per shift. Because this is near the edge of the data, this prediction should be treated with caution.

Task 2 — Extrapolation warning for \( x = 10 \) weeks:

\[ \hat{y}(10) = 6.62 + 4.56 \times 10 = 6.62 + 45.60 = 52.22 \text{ units} \]

This prediction is an extrapolation — the observed data extends only to \( x = 7 \) weeks. Productivity gains from training often level off as workers approach mastery; the linear relationship may not hold at higher experience levels. The supervisor should treat 52.22 as an uncertain upper-bound estimate and flag it explicitly as an extrapolation beyond the training data.

Task 3 — Residual for worker with \( x = 4 \) weeks, \( y = 28 \) units:

\[ \hat{y}(4) = 6.62 + 4.56 \times 4 = 6.62 + 18.24 = 24.86 \]

\[ e = 28 - 24.86 = +3.14 \]

This worker produced 3.14 units more than the model predicted for someone with 4 weeks of training. Their actual output is above the regression line — they outperformed the average trend for workers at that experience level. A positive residual like this could reflect individual aptitude, higher motivation, or simply random variation.

Task 4 — Outlier reasoning for (\( x = 7 \), \( y = 80 \)):

Adding this extreme point (\( x = 7 \), \( y = 80 \)) — far above the other 7-week workers who produced around 40 units — would make it a high-leverage influential point. Its inclusion would pull the right end of the regression line sharply upward, increasing the slope (\( b \) would rise). The formula \( b = r \times s_y / s_x \) would change because \( r \) and \( s_y \) would both increase when this outlier is added, amplifying the slope estimate.

Regarding the point of means: the mathematical guarantee that the regression line passes through \( (\bar{x}, \bar{y}) \) still holds — it always does for any least-squares line. However, the values of \( \bar{x} \) and \( \bar{y} \) themselves would shift upward (because this outlier has extreme \( y \)), so the new equation would pass through a different point of means than before. This is why checking the scatter plot for influential outliers before trusting any regression equation is a required condition check (C9).

Section 9 — Challenge Problem Solutions

Challenge 1 — Regression Asymmetry

Dataset: \( r = 0.70 \), \( s_x = 3.0 \), \( s_y = 6.0 \), \( \bar{x} = 10 \), \( \bar{y} = 40 \).

(a) Slope of \( \hat{y} \) on \( x \):

\[ b = r \times \frac{s_y}{s_x} = 0.70 \times \frac{6.0}{3.0} = 0.70 \times 2.0 = 1.40 \]

Intercept: \( a = 40 - 1.40 \times 10 = 40 - 14 = 26 \). Equation: \( \hat{y} = 26 + 1.40x \).

(b) Slope of \( \hat{x} \) on \( y \) (predicting \( x \) from \( y \)):

\[ b' = r \times \frac{s_x}{s_y} = 0.70 \times \frac{3.0}{6.0} = 0.70 \times 0.5 = 0.35 \]

Intercept: \( a' = 10 - 0.35 \times 40 = 10 - 14 = -4 \). Equation: \( \hat{x} = -4 + 0.35y \).

(c) Is \( b' = 1/b \)?

\( 1/b = 1/1.40 \approx 0.714 \). But \( b' = 0.35 \neq 0.714 \). So no, \( b' \neq 1/b \).

Confirming the relationship: \( b \times b' = 1.40 \times 0.35 = 0.49 = (0.70)^2 = r^2 \). This is always true: \( b \times b' = r^2 \).

(d) When would \( b' = 1/b \) exactly?

\( b' = 1/b \) requires \( b \times b' = 1 \). Since \( b \times b' = r^2 \), this holds only when \( r^2 = 1 \), i.e., when \( |r| = 1 \). Only in a perfect linear association do the regression of \( y \) on \( x \) and the regression of \( x \) on \( y \) coincide (as the same line), making their slopes reciprocals. For any \( |r| < 1 \), the two regression lines are genuinely different — each optimizes a different objective (minimizing vertical vs. horizontal errors).


Challenge 2 — Sensitivity of \( b \) to \( r \)

Using \( s_x = 4.0 \) and \( s_y = 10.0 \): the ratio \( s_y / s_x = 10.0 / 4.0 = 2.5 \) for all rows.

\( r \)\( b = r \times 2.5 \)Notes
0.41.0
0.82.0Doubles when \( r \) doubles from 0.4 to 0.8? Yes — \( b \) is linear in \( r \).
−0.4−1.0Negative \( r \) → negative \( b \)
1.02.5Maximum slope = \( s_y / s_x \); all points lie exactly on the line

(a) When \( r = 0 \): \( b = 0 \times 2.5 = 0 \). The regression line is horizontal: \( \hat{y} = \bar{y} \) for all \( x \). Knowing \( x \) provides no useful linear information — the best prediction is always the mean of \( y \).

(b) When \( r = 1 \): \( b = 1.0 \times 2.5 = 2.5 = s_y / s_x \). All data points lie exactly on the regression line (perfect positive linear association). The slope equals exactly the ratio of standard deviations. As \( |r| \) increases from 0 to 1, the regression line tilts from horizontal (flat) toward its maximum slope of \( \pm s_y / s_x \) — controlled entirely by \( r \).


Challenge 3 — Regression to the Mean

Two exams with \( r = 0.60 \), both having \( \mu = 70 \) and \( \sigma = 10 \) (same mean and SD for both exams).

(a) Computing \( b \) and \( a \):

Since \( s_x = s_y = \sigma = 10 \): \( b = r \times \sigma / \sigma = r = 0.60 \)

\[ a = 70 - 0.60 \times 70 = 70 - 42 = 28 \]

Equation: \( \hat{y} = 28 + 0.60x \) (predicting Exam 2 score from Exam 1 score).

(b) Predicted Exam 2 score for a student who scored 90 on Exam 1:

\[ \hat{y}(90) = 28 + 0.60 \times 90 = 28 + 54 = 82 \]

The predicted Exam 2 score is 82 — below 90, and closer to the mean (70).

(c) Why does this happen mathematically?

When \( s_x = s_y \), the slope equals \( b = r < 1 \) (for imperfect association). The predicted deviation from the mean is \( b \) times the observed deviation:

\[ \hat{y} - \bar{y} = b(x - \bar{x}) = 0.60 \times (90 - 70) = 0.60 \times 20 = 12 \]

So the predicted score is \( 70 + 12 = 82 \). The factor of \( r = 0.60 \) "shrinks" the deviation toward the mean. This shrinkage happens because imperfect correlation (\( r < 1 \)) means part of any extreme score is attributable to random error — and random error does not repeat from exam to exam. The portion of the deviation due to random fluctuation is expected to vanish on the next measurement.

(d) The regression fallacy:

No — the coach's conclusion is not valid. The phenomenon at play is called regression to the mean (or the regression fallacy). Even if the coach did nothing differently between weeks, the top performers in week 1 would be expected, on average, to score closer to the group average in week 2 — simply because part of their extreme week-1 performance was due to random luck, which does not repeat.

Attributing the decline to "training fatigue" confuses a mathematical artifact with a causal mechanism. This fallacy has real consequences in many fields: it underlies the incorrect belief that praise leads to worse performance, that punishment leads to better performance, and that successful strategies "wear out." Any time an extreme observation is followed by a less extreme one, one should consider regression to the mean as a competing explanation before concluding that any intervention caused the change.