Solutions — Data Visualization

How to use this page: Try each problem in the lesson before checking solutions here. If your answer doesn't match, read the solution carefully — especially the part that explains why common wrong answers are wrong. Understanding the error matters more than getting the right answer the first time.

← Back to Lesson DS-2

Section 5: Guided Practice Solutions

▾

Problem 1 — Completing a Frequency Table

The commuter dataset (n = 40).

1a — Relative frequency for the 30–44 class. The 30–44 class has 14 observations out of 40. ✓

1b — Cumulative frequency for the 30–44 class. Add all classes up to and including 30–44: . ✓

Common mistake: writing 18 (which is through 15–29) or 14 (which is just for this class). Cumulative means “up to and including this class.”

1c — What percentage of commuters take 45 minutes or more? “45 minutes or more” = 45–59 class () + 60–74 class (). Alternatively: . ✓

Common mistake: answering 95% — that’s the cumulative relative frequency through 45–59, meaning 95% of commuters take 59 minutes or less. You want the proportion who take 45 or more minutes.

Problem 2 — Choosing the Correct Graph Type (All 5 Variants)

The key decision chain: What type of variable? → What graph?

Variant 0 — Transportation mode (qualitative nominal) → Bar chart. Categories with no natural order, one variable, frequency distribution.
Variant 1 — Body temperature (quantitative continuous) → Histogram. Numerical measurements across a continuous range.
Variant 2 — Star ratings (qualitative ordinal) → Bar chart (with categories in order, 1–5 stars). Still qualitative despite having a natural order — bars should have gaps.
Variant 3 — Monthly rainfall over 36 months → Time-series plot. X-variable is time; showing how rainfall changes over consecutive periods is the goal.
Variant 4 — Number of siblings (quantitative discrete, few values) → Bar chart with gaps. For small discrete count data (0, 1, 2, 3, 4, 5+), separated bars clarify the distinct possible values.

Problem 3 — Reading a Frequency Table (All 5 Variants)

Reference table: quiz completion times for 50 students.

Variant 0: Students in 20–24 min class → . Read directly from the column.
Variant 1: Proportion taking 25+ min → . Or add for 25–29 and 30–34: .
Variant 2: Students finishing under 20 min → through 15–19 = 15. (6 in 10–14 + 11 in 15–19 = 15.)
Variant 3: Percentage finishing 15–24 min → .
Variant 4: Modal class → 20–24 minutes with , the highest in the table.

Problem 4 — Identify the Misleading Feature

The bar chart of store revenue (Flagship: $215M, Neighbourhood: $208M, Express: $196M) uses a y-axis starting at $180M instead of $0.

Primary flaw: truncated y-axis. The actual range is $215M − $196M = $19M, which is 8.8% of the Express revenue. By starting the axis at $180M, the charted range is $220M − $180M = $40M. The Flagship bar occupies of the axis height; the Express bar only . This makes Flagship appear more than twice as tall as Express, visually overstating a modest 9.7% actual advantage.

Section 6: Independent Practice Solutions

▾

Problem 1 — Build a Complete Frequency Table (Generator)

This problem generates a new dataset each time. The solution method is always the same:

Find min and max. Class width = .
Define 5 class boundaries starting from min.
Tally each value into its class to get .
Compute for each class; verify they sum to 1.00.
Compute as running totals of ; final must equal n = 15.
Compute as running totals of ; final must equal 1.00.

Sanity check four quantities: , , final , final .

Problem 2 — Interpreting a Histogram (Daily Steps)

The histogram of daily steps for 60 office workers (classes: 2000–3999, 4000–5999, 6000–7999, 8000–9999, 10000–11999; frequencies approximately 5, 12, 22, 15, 6).

(a) Class width: Each class spans 2,000 steps. Class width = 2,000.

(b) Workers in 6,000–7,999 steps: Read the bar height — approximately 22 workers. (The bar reaches slightly above the “20” gridline.)

(c) Percentage walking fewer than 6,000 steps:

(d) Shape: The distribution is slightly right-skewed. The peak is at 6,000–7,999 steps. The right tail (8,000–11,999) falls off gradually. Most office workers cluster in the 4,000–8,000 range, with fewer reaching 10,000+ steps.

Problem 3 — Select the Best Graph (All 5 Variants)

Variant 0 — Blood pressure vs. age (two quantitative variables, looking for relationship) → Scatter plot.
Variant 1 — Job satisfaction levels (qualitative ordinal, 5 ordered categories) → Bar chart (in order: Very Dissatisfied → Very Satisfied).
Variant 2 — Weekly profit over 52 weeks (quantitative, x-variable is time) → Time-series plot.
Variant 3 — Geographic region of purchases (qualitative nominal, 5 categories, part-to-whole message) → Bar chart or pie chart (both acceptable; bar chart preferred for comparability).
Variant 4 — Weight of cereal boxes (quantitative continuous, n = 200) → Histogram (n is too large for a stem-and-leaf).

Problem 4 — Two Variables, One Graph

(a) Graph: scatter plot. Both hours of sleep and cognitive performance score are quantitative continuous variables. The goal is to explore their relationship — scatter plot is the correct choice. Each of the 85 participants becomes a point: x = hours of sleep, y = test score.

(b) Axes: Hours of sleep on the x-axis (explanatory variable); cognitive performance score on the y-axis (response variable). Convention: the variable hypothesized to “explain” the other goes on x.

(c) Pattern for positive association: An upward trend from lower-left to upper-right. Points with small x (few hours of sleep) cluster toward small y (low scores), and points with large x (more sleep) cluster toward large y (higher scores). This upward pattern — a positive association — supports the hypothesis that more sleep correlates with better performance.

Problem 5 — Critique the Crime Rate Graph

The graph shows crime rates: Year 1: 958, Year 2: 955, Year 3: 952, Year 4: 950, Year 5: 947, Year 6: 943 (incidents per 100,000). Y-axis: 940 to 960.

(a) Misleading technique: truncated y-axis. Starting at 940 compresses the y-range to 20 units. The 15-unit drop (958 → 943) spans 75% of the chart height, making the trend look dramatic. On an axis from 0 to 1,000, the line would be nearly flat — accurately reflecting that crime changed by only 1.57%.

(b) True percentage decrease: A 1.57% decline over 6 years — real, but not a “plummet.”

(c) Honest redraw: Start the y-axis at 0 (or at a clearly indicated break point with a zigzag symbol). Label both the axis and each data point with its exact value. The line would appear as a very gentle downward slope — accurately conveying the modest improvement. The headline should read “Crime rate declines 1.6% over six years” rather than “plummets.”

Section 7: Mastery Check Solutions

▾

Problem 1 — Feynman Test: Histogram vs. Bar Chart

Key points a complete answer should include:

A histogram’s x-axis is a continuous number line. Bars touch because classes are adjacent intervals — there is no gap between “40–49” and “50–59” because no values exist “between” these intervals on a number line (they are contiguous).
Categorical data has no number line. “Red,” “Blue,” and “Green” are not on any numerical scale. There is nothing “between” two categories.
The gap in a bar chart signals: these are separate, unconnected categories. Using a histogram for categorical data falsely implies a continuous scale between categories.
The visual rule (“bars touch vs. bars separate”) is a direct consequence of the underlying data structure, not an aesthetic preference.

Problem 2 — Apply: Soccer Player Dataset

Part A — Distribution of distance run per game (quantitative continuous): histogram. Distance run is measured along a continuum. A histogram groups values into class intervals and shows where most players cluster, the spread, and the shape.

Part B — Relationship between distance run and goals scored (two quantitative variables): scatter plot. Both variables are quantitative; the goal is to explore whether players who run more tend to score more (or less). Each player = one point; x = distance run, y = goals scored.

Part C — Number of players per position (qualitative nominal — Goalkeeper, Defender, Midfielder, Forward): bar chart. Position is categorical with no natural numerical scale. Four bars, one per position category, with gaps between them.

Problem 3 — Error Detection: Unequal Class Widths

The error: The student used the same bar width for all four classes, but the 30–49 class spans 20 cm while the others span only 10 cm. In a histogram, area = frequency (when displayed correctly). When class widths are unequal, the y-axis must show frequency density = f ÷ class width, so that the area of each bar correctly represents its frequency.

Frequency densities:

On a frequency density histogram, the 20–29 bar (0.90) would be the tallest — not the 30–49 bar (0.55). The student’s claim that “the third bar is the most frequent” is misleading because it visually inflated the wide class. The 20–29 range actually has the highest concentration of plants.

Section 8: Boss Fight Solutions

▾

Path A — The Analyst

Frequency table (classes: 3–5, 6–8, 9–11, 12–14, n = 25):

Class
3–5	6	0.24	6	0.24
6–8	9	0.36	15	0.60
9–11	7	0.28	22	0.88
12–14	3	0.12	25	1.00
Total	25	1.00	—	—

Tally verification:

3–5: 3, 5, 4, 3, 5, 4 → 6 ✓
6–8: 8, 7, 6, 8, 7, 6, 8, 7, 6 → 9 ✓
9–11: 9, 11, 10, 9, 11, 10, 9 → 7 ✓
12–14: 12, 14, 13 → 3 ✓

Histogram description:

Modal class: 6–8 (, the tallest bar)
Shape: right-skewed — peak at 6–8, longer tail stretching toward 12–14
Students reading fewer than 9 books: through 6–8 = 15, so 60%

Class width issue: The originally proposed classes (2–4, 5–7, 8–10, 11–14) have unequal widths — the last class is 4 units wide instead of 3. Using classes 3–5, 6–8, 9–11, 12–14 (each 3 units wide) corrects this, and the data range (3 to 14) is fully covered.

Path B — The Architect

Graph 1 — Holiday Revenue:

Flaw 1: truncated y-axis exaggerates the inter-quarter gap.
Flaw 2: 3D pictogram icon for Q4 is twice as wide AND taller → area is ~4× larger, when actual increase = $58.7M / $41.2M ≈ 1.42× (42% more, not 4× more).
Fix: simple equal-width bars starting at $0 on the y-axis. The Q4 bar is still visibly the tallest — accurately.

Graph 2 — Satisfaction Distribution:

Flaw: unequal class widths (40-point class vs. 10-point classes) with frequency (not frequency density) on the y-axis. The 41–80 bar inflates visually.
Frequency densities: 41–80: 30 ÷ 40 = 0.75/pt; 81–90: 25 ÷ 10 = 2.50/pt. The 81–90 class is actually far more densely concentrated.
Fix: redesign with equal class widths (e.g., 0–19, 20–39, 40–59, 60–79, 80–100), or use frequency density on the y-axis.

Graph 3 — Market Share Pie:

Flaw 1: 3D tilt inflates the nearest slice (Electronics, 18%) to appear ~30% visually.
Flaw 2: 7 categories with legend-only labels require too much eye movement to read accurately.
Fix: flat 2D bar chart, sorted by market share (highest to lowest), with percentage labels on each bar.

Section 9: Challenge Problem Solutions

▾

Challenge 1 — The Ogive

Using the frequency table from Example 1 (defective items per batch):

Class	Upper boundary
1–2	2.5	0.10
3–4	4.5	0.40
5–6	6.5	0.75
7–8	8.5	0.95
9–10	10.5	1.00

Median estimate (50th percentile): The lies between the 3–4 class (, upper boundary 4.5) and the 5–6 class (, upper boundary 6.5). Linear interpolation:

Preview of DS-5: The ogive is the graphical tool for reading any percentile. The p-th percentile is estimated by drawing a horizontal line at and reading the corresponding x-value.

Challenge 2 — Does Bin Width Matter?

(a) Changing bin width changes how values are aggregated. Wide bins merge many observations together, smoothing the distribution and potentially hiding structure. The 3-class histogram lumped together the 24–26 and 27–29 subgroups into one “24–29” bar, hiding the right-skew that Histogram B makes visible.

(b) Histogram B (6 classes) reveals more meaningful structure for n = 40. Sturges’ rule gives , confirming 6 classes is appropriate. Too few bins smooth away real patterns; too many bins introduce noise.

(c) With 18 classes (width = 1 year), n = 40 gives an average of ~2.2 observations per bar. Many bars would contain 0, 1, or 2 values. The histogram would be extremely jagged — every accidental gap in the data would appear as a bar of height 0, creating a false impression of “holes” in the distribution when there are none.

Challenge 3 — The Double Y-Axis Debate

(a) Why dual y-axis misleads: The scales can be freely adjusted to make two completely unrelated variables appear to move together — or to make genuinely correlated variables appear independent. Visual alignment of two lines is entirely a function of scale choices, not of actual correlation. This makes any correlation claim from a dual y-axis graph highly suspect without independent verification.

(b) Legitimate uses: A dual y-axis is defensible when (1) both variables are contextually related and telling the same story (e.g., temperature and precipitation on a climate chart), (2) both axes are clearly labeled with units, (3) the designer does not manipulate scales to create false visual alignment, and (4) the chart does not imply a direct proportional comparison between the two scales.

(c) Confounding variable: Both ice cream sales and drowning rates are driven by a third variable — summer heat. Hot weather causes more people to buy ice cream and more people to swim (creating more opportunities to drown). When a third “lurking” or confounding variable drives both measured variables, a strong correlation arises even without any causal link between them. The ice cream → drowning inference is a classic spurious correlation. This concept is formalized in REG-1 (Correlation Analysis) and is one of the most important principles in applied statistics.

You’ve completed DS-2: Data Visualization! The next lesson is DS-3: Central Tendency Measures, which uses the frequency distributions you’ve built here to compute means, medians, and modes.

← Return to Lesson DS-2

DS-2: Solutions — Data Visualization

Section 5: Guided Practice Solutions

Problem 1 — Completing a Frequency Table

Problem 2 — Choosing the Correct Graph Type (All 5 Variants)

Problem 3 — Reading a Frequency Table (All 5 Variants)

Problem 4 — Identify the Misleading Feature

Section 6: Independent Practice Solutions

Problem 1 — Build a Complete Frequency Table (Generator)

Problem 2 — Interpreting a Histogram (Daily Steps)

Problem 3 — Select the Best Graph (All 5 Variants)

Problem 4 — Two Variables, One Graph

Problem 5 — Critique the Crime Rate Graph

Section 7: Mastery Check Solutions

Problem 1 — Feynman Test: Histogram vs. Bar Chart

Problem 2 — Apply: Soccer Player Dataset

Problem 3 — Error Detection: Unequal Class Widths

Section 8: Boss Fight Solutions

Path A — The Analyst

Path B — The Architect

Section 9: Challenge Problem Solutions

Challenge 1 — The Ogive

Challenge 2 — Does Bin Width Matter?

Challenge 3 — The Double Y-Axis Debate