EN FR

DS-1: Solutions — Statistical Vocabulary and Sampling

Module {1} · Descriptive Statistics

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson DS-1

Section 5: Guided Practice

Problem 1 — The Four Elements (C1 + C2)

NBA strength training scenario: basketball analytics company surveys 60 NBA players, average 280 min/week.

Step 1 — Population: All professional basketball players.
Not just the 60 contacted (that's the sample), and not just NBA players — the analytics company wants to draw conclusions about professional basketball broadly. Population = the whole target group.

Step 2 — Statistic: \( \bar{x} = 280 \) minutes/week.
The 280 was computed from the 60-player sample — it's a statistic. The unknown average for all professional players (which we never measured) is the parameter.

Step 3 — Parameter notation: \( \mu \) — population mean strength-training time.
\( \bar{x} = 280 \) is the statistic (sample mean). \( \mu \) is what we're trying to estimate. \( s \) is sample standard deviation — a different statistic from a different question.

Common mistake: Calling the 280 minutes "μ." The 280 was computed from 60 players — it's a sample computation, so it's \( \bar{x} \).

---

Problem 2 — Notation Match (C2)

2a — \( \mu = \$67{,}200 \). All 4,500 employees were measured — the whole population. When you measure everyone, the resulting number is a parameter. No sampling occurred.

2b — \( \bar{x} = \$1{,}450 \). The 80 apartments are a random sample of all Montreal apartments. The $1,450 was computed from this sample — it's a statistic estimating the population mean rent μ.

Key rule: If you measured the entire population → parameter (μ). If you measured a sample → statistic (\( \bar{x} \).)

---

Problem 3 — Classify the Variable (C3) — Variant Bank

Correct answers for all 5 variants:

The two biggest traps: (1) Numbers coded as labels (postal codes, phone numbers) are nominal — not quantitative. (2) An ordered rating scale (1 to 5) looks discrete quantitative, but if the numbers are labels for categories, it's ordinal qualitative. Ask: "Are the gaps between values equal and meaningful?"

---

Problem 4 — Identify the Sampling Method (C4) — Variant Bank

Correct answers for all 5 variants:

Cluster vs. Stratified — the critical distinction: Stratified = homogeneous groups, sample from ALL groups. Cluster = heterogeneous groups, sample SOME groups entirely. You divide the population in both — the difference is whether you sample from all groups or select whole groups.

---

Problem 5 — Identify the Bias (C5)

Type of bias: Voluntary response bias.
Viewers who feel strongly about the tax issue are far more likely to text in than indifferent viewers. Self-selected responses systematically overrepresent extreme opinions.

Direction: Toward "No".
Tax opponents tend to feel more urgently motivated to act. A tax increase hurts people economically in a direct, immediate way — that kind of tangible cost motivates stronger responses than the more diffuse benefits of public spending. The "Yes" side (those who support the tax) is likely less motivated to call a radio station to express it.

Sample size ≠ reliability: 4,200 responses sounds like a lot. But 4,200 strongly-motivated non-representative respondents is less reliable than 100 randomly selected voters. Bias doesn't wash out with larger samples.

---

Section 6: Independent Practice

Problem 1 — The Full Picture (C1 + C2)

Pharmaceutical drug trial: 180 patients from 12 clinics, mean blood pressure drop of 14.2 mmHg.

---

Problem 2 — Is It μ or x̄? (C1 + C2)

The rule that resolves all ambiguity: Was the entire defined population measured? → Parameter. Was a subset measured? → Statistic.

---

Problem 3 — Variable Type Generator (C3)

This problem generates new scenarios randomly. The correct classification and explanation appear after you answer. General keys:

---

Problem 4 — Sampling Method Critique (C4) — Variant Bank

Core answers for all 5 variants:

---

Problem 5 — Bias in a News Story (C5)

Remote work productivity study: 1,200 remote employees at 5 tech companies; 84% self-report higher productivity.

---

Problem 6 — Survey Question Critique (C6)

---

Section 7: Mastery Check

Question 1 — Feynman Test

Model answer (there is no single correct response — evaluate your own answer against this):

"A population is the whole group you care about — like every student in Canada. A sample is just the part you actually study — like 200 randomly chosen students. We use samples because studying millions of people is too expensive and slow. A parameter describes the full population (like the true average GPA of all Canadian students — we probably can't measure it exactly). A statistic describes the sample (like the average GPA of your 200 students — you computed it directly). The statistic is our best estimate of the parameter."

Checklist for your own answer:

---

Question 2 — Apply (C3 + C4)

Coffee chain with 240 locations in 4 regions: Plan 1 (cluster, 30 stores) vs. Plan 2 (stratified by region).

---

Question 3 — Find the Error (C4 + C5)

Food blogger Instagram poll: 847 responses, 91% prefer traditional bagels. Report calls it stratified sampling, calls 91% "μ," calls it quantitative-continuous.

Four errors in the student's analysis:

  1. "Huge random sample": Voluntary response / convenience sampling — not random. Followers who engage with food content are not representative of all Montrealers.
  2. "Reliable estimate": A biased large sample is less reliable than an unbiased small one. The 91% is skewed by the food-enthusiast audience.
  3. "μ" for a proportion: Population proportion is written \( p \), not \( \mu \). The sample proportion is \( \hat{p} \). \( \mu \) is reserved for means.
  4. "Stratified sampling": Stratification requires deliberate a priori division of the population into strata before sampling. Post-hoc grouping of respondents is not stratification. This was voluntary response sampling.
---

Section 8: Boss Fight

Path A — The Analyst: Screen Time Survey

Full solutions appear in the lesson's Boss Fight section. Summary of errors found:

  1. Misidentified as stratified sampling (actually voluntary response / convenience)
  2. Misused \( \mu \) for a sample-computed value (should be \( \bar{x} \))
  3. Misclassified screen time as qualitative-ordinal (it's quantitative-continuous)
  4. Used a leading question ("excessive hours") that biases responses downward
  5. Non-response bias and undercoverage from the newsletter subscriber convenience sample

Path B — The Architect: CEGEP Well-Being Study

Key design decisions:

---

Section 9: Challenge Problems

Challenge 1 — Can a Statistic Equal a Parameter? (Variant Bank)

Core insight for all variants: \( \bar{x} = \mu \) can happen by coincidence, but it's rare and doesn't mean the sample "perfectly represents" the population. The key concept is sampling variability — different samples give different statistics. A statistic that happens to equal the parameter is still just one observation from the distribution of all possible sample means.

In Variant 1, none of the 6 possible size-2 samples produced \( \bar{x} = \mu = 60 \). But the average of all 6 sample means = 60 = μ. This illustrates unbiasedness: \( \bar{x} \) is an unbiased estimator of \( \mu \) (correct on average, not necessarily in any given sample).

---

Challenge 2 — Multistage vs. Cluster Design

---

Challenge 3 — Why Convenience Sampling is Always Biased

---
← Return to Lesson DS-1