EN FR

PR-4: Discrete Random Variables

Module 2 · Probability

Section 1: Introduction

When you turn 25, your car insurance premiums drop. This isn’t the insurance company being generous — it’s mathematics. The company has records on millions of drivers. They know the probability distribution of accidents by age, and they price each policy to make a profit across thousands of customers, even if any individual policy might lose money.

The concept at work is expected value — the long-run average outcome of a random process, weighted by how likely each outcome is. Insurance companies, casinos, investment funds, and pharmaceutical companies all make decisions by computing expected values. When a drug company decides whether to run a clinical trial, they are computing expected value. When a casino sets the payout on a slot machine, they are setting expected value below 1.

After this lesson, you will be able to:

By the end of this lesson, you will be able to:

  • Distinguish between discrete and continuous random variables and interpret the notation , .
  • Construct a probability mass function (PMF) and verify its validity.
  • Compute the expected value and interpret it as a long-run average.
  • Compute the variance and standard deviation using the shortcut formula.

This lesson introduces random variables — functions that attach numbers to random outcomes — and the mathematical tools for describing their behaviour: expected value and variance. These concepts are the bridge between probability and statistics.

Section 2: Prerequisites

  • Sample space (from PR-1): The set of all possible outcomes of a random experiment. You need this to list every value a random variable can take.
  • Classical probability (from PR-1): for equally likely outcomes. Today we generalize: outcomes can have unequal probabilities, so we use a table instead of counting.
  • Probability axioms (from PR-1): for any event , and for the full sample space. The same axioms apply to every row and to the column total in a PMF table.
  • Complement rule (from PR-1): . You will use the equivalent idea that the sum of all PMF probabilities must equal 1 to find a missing value.
  • Weighted average (arithmetic): Multiplying each value by its “weight” and summing. Expected value is exactly a weighted average — weights are probabilities.

Checkpoint: If , what is ?

Success Factor:

Bridging old and new: In PR-1, you computed probabilities for individual events. Today, you will organize all outcomes into a table (the PMF) and compute a single number — — that summarizes the entire distribution. A key shift: probabilities in a PMF need not be equal, and need not equal any specific outcome in the table.

Retrieval Warm-up — from earlier lessons

A bag contains 3 red balls and 5 blue balls. You draw one ball, note its colour, put it back, and draw again. What is the probability the first ball is red and the second is blue?

How many ways can you choose 3 students from a group of 10 to form a committee (no assigned roles)?

Section 3: Core Concepts

Six concepts — here is how they build on each other:

  • C1 — Random Variables: What kind of quantity are we studying?
  • C2 — PMF: How do we record all the probabilities in one place?
  • C3 — Expected Value: What is the long-run average outcome?
  • C4 — E(X²): An intermediate quantity needed for variance.
  • C5 — Variance and SD: How spread out are the outcomes?
  • C6 — CDF: How do we answer “at most” and “at least” questions?

C1 — Random Variables

When you roll a die, you don’t know in advance which face will appear. But the outcome is always a number — 1, 2, 3, 4, 5, or 6. A random variable is a rule that assigns a number to each outcome of a random experiment.

We write to mean a random variable, and (lowercase) to mean one specific value it can take.

Random Variable

A random variable is a function that assigns a real number to each outcome in the sample space of a random experiment.

  • A discrete random variable takes a countable set of values (e.g., 0, 1, 2, 3, …).
  • A continuous random variable takes any value in an interval (e.g., any height between 150 cm and 200 cm).
Examples:
  • Number of customers who enter a store in one hour → discrete (0, 1, 2, 3, …)
  • Height of a randomly chosen student → continuous (any value in an interval)
  • Number of defective items in a batch of 20 → discrete (0, 1, 2, …, 20)
  • Time until the next earthquake → continuous (any non-negative real number)

Two number lines contrasting discrete and continuous random variables. Top row: a number line with five solid circles at integers 0 through 4, representing isolated countable values — the number of phone calls received. Bottom row: a solid shaded band from 150 to 200, representing a continuous interval where height in centimetres can take any value.

Discrete01234countable, isolated values — e.g., # of calls receivedContinuous150200any value in the interval — e.g., height in cm

Dots can be counted and listed; the band cannot — every point inside it is a valid value. That is the defining distinction.

“Discrete” does not mean “small.” The number of grains of sand on a beach is countable (even if enormous) — in principle it is discrete. The distinction is whether the variable can take any value in an interval (continuous) or only isolated, countable values (discrete).


C2 — Probability Mass Function (PMF)

A PMF is simply a table (or formula) that lists every value that can take, along with the probability of each. Think of it as the complete “probability recipe” for your random variable.

Probability Mass Function (PMF)

The probability mass function of a discrete random variable assigns a probability to each possible value:

A function is a valid PMF if and only if both conditions hold.

Three-step pipeline showing how to build a PMF for X equals the number of heads in two coin flips. Step 1 Sample Space: list all outcomes HH, HT, TH, TT. Step 2 Assign X: map each outcome to its head count — HH maps to 2, HT and TH each map to 1, TT maps to 0. Step 3 PMF Table: tally the probabilities — P(X=0)=0.25, P(X=1)=0.50, P(X=2)=0.25, which sum to 1.

Sample Space SHH HTTH TT4 equally likely outcomesassignX valueX = # headsHH → 2HT → 1TH → 1TT → 0tallyprobsPMF TableP(X = 0) = 0.25P(X = 1) = 0.50P(X = 2) = 0.25sum = 1.00 ✓

The PMF collapses the sample space: TH and HT both give X = 1, so their probabilities add to 0.50.

Example: = number of heads in two fair coin flips.

012
0.250.500.25

Check: ✓. All probabilities ✓.

A PMF does not require each probability to equal . Outcomes can have unequal probabilities — that is the whole point of building a PMF rather than just counting.

Bar chart of a probability mass function for X taking values 1, 2, 3, 4 with probabilities 0.10, 0.40, 0.30, 0.20 respectively. A vertical dotted line marks E(X) = 2.60, showing the expected value is the balance point of the distribution, not the tallest bar.

P(X = x)0.000.100.200.300.400.100.400.300.201234xE(X) = 2.60(balance point)

Notice: E(X) = 2.60 falls between x = 2 and x = 3. It is not the tallest bar (x = 2 is the mode). E(X) is the balance point — where the distribution would balance if the bars had physical weight.


C3 — Expected Value

The expected value answers: “If this random experiment were repeated an enormous number of times, what would the average outcome be?” It is a weighted mean where each outcome is weighted by its probability .

We write to emphasize that the expected value plays the same role as the population mean in descriptive statistics.

Expected Value

The expected value (mean) of a discrete random variable is:

This is the long-run average outcome across infinitely many repetitions of the experiment.

Mini-example: = payout on a game where you roll a die and win $3 for a 6, $1 for any other face, and pay $0. Wait — let’s use a simpler PMF. For the PMF in the chart above (X = 1, 2, 3, 4 with P = 0.10, 0.40, 0.30, 0.20):

If you repeated this experiment millions of times and averaged all results, the average would approach 2.60.

Try the three preset buttons. In the right-skewed preset, the mode is x = 4, but E(X) is pulled leftward by the lower-probability outcomes — the balance point is not the tallest bar.

E(X) is not the mode. The most likely outcome (mode) is in the example above (probability 0.40). But . The expected value is the balance point of the distribution — it accounts for how far each outcome is from the center, not just which has the highest probability. In fact, E(X) = 3.5 for a fair die, even though no face shows 3.5.


C4 — Computing

To compute variance, you first need a related quantity: . This is also a weighted average, but you square each outcome before weighting it.

We write to mean “the expected value of the squared random variable.”

Expected Value of X²

This is computed from the same PMF: square each value, multiply by its probability, and sum.

Continuing the example above:

. In the example, , but . These are equal only when is a constant (no randomness at all). Confusing these two quantities is one of the most common errors when computing variance — it would give , which is wrong.

Side-by-side comparison of two quantities. Left column shows [E(X)] squared: first average the x values to get E(X) equals 2.60, then square the result to get 6.76. Right column shows E(X squared): first square each x value to get 1, 4, 9, 16, then take the weighted average to get 7.60. The two results differ — 6.76 is not equal to 7.60 — because the order of operations matters.

[E(X)]²① Average the x values firstE(X) = 2.60② Then square the result2.60² = 6.76E(X²)① Square each x value first1², 2², 3², 4² → 1, 4, 9, 16② Then weighted average7.606.76 ≠ 7.60 — order of operations changes the resultUsing 6.76 instead of 7.60 gives Var(X) = 0 — the most common error in this chapter

C5 — Variance and Standard Deviation

Variance measures how spread out the distribution is — how far a typical outcome is from the expected value. A small variance means most outcomes cluster near ; a large variance means outcomes are spread widely.

We write and .

Variance and Standard Deviation

The variance of a discrete random variable is computed with the shortcut formula:

The standard deviation is:

Both measure spread in the same units as (standard deviation) or squared units (variance). Variance is always .

Completing the example:

Interpretation: outcomes of typically deviate from the expected value of 2.60 by about 0.92 units.

Two PMF bar charts side by side, both with E(X) equal to 3 marked by a red dotted vertical line. Left chart shows a low-variance distribution concentrated at x equals 2, 3, and 4 with probabilities 0.25, 0.50, and 0.25 and a standard deviation of approximately 0.71 shown as a narrow bracket. Right chart shows a high-variance distribution spread across x equals 1 through 5 with probabilities 0.15, 0.20, 0.30, 0.20, 0.15 and standard deviation approximately 1.26 shown as a wide bracket. Both distributions have the same center but very different spreads.

Low Variance(σ ≈ 0.71)P(X = x)00.250.500.250.500.2512345xE(X)=3±σ ≈ 0.71High Variance(σ ≈ 1.26)00.250.500.150.200.300.200.1512345xE(X)=3±σ ≈ 1.26Both distributions have E(X) = 3 — variance reveals the difference in spread

The ±σ brackets have the same center (E(X) = 3) but very different widths. A large variance does not change where the distribution is centered — only how much outcomes scatter around that center.

Variance is always . If you compute a negative variance, you made an arithmetic error. Since , it is an average of non-negative numbers and cannot be negative. A negative answer is a signal to recheck your arithmetic, not a valid result.


C6 — Cumulative Distribution Function (CDF)

The PMF answers “exactly equal to.” The CDF answers “less than or equal to” — it accumulates the probabilities as increases.

We write to denote the CDF evaluated at .

Cumulative Distribution Function (CDF)

The cumulative distribution function of a discrete random variable is:

The CDF gives the probability that takes a value at most .

To answer “at least ”: .

To answer “between and ”: .

CDF for our running example ( = 1, 2, 3, 4):

10.100.10
20.400.50
30.300.80
40.201.00

. .

Slide from x = 1 to x = 4 one step at a time. Each step highlights one new bar in the PMF and adds one jump to the staircase. is the height difference between two staircase levels.

Section 4: Worked Examples

Example 1 — Verify a PMF

A game assigns payouts according to the table below. Verify that this is a valid PMF.

(payout $)−2035
0.200.500.200.10
Solution (fully narrated):

I notice this problem asks me to verify the PMF — that means checking the two conditions, not computing anything else yet.

Condition 1: All probabilities non-negative. I scan the row: 0.20, 0.50, 0.20, 0.10. All are . ✓

Condition 2: Probabilities sum to 1.

Both conditions hold, so this is a valid PMF. I could stop here — the question only asks to verify, not to compute E(X). Resisting the urge to keep calculating is part of reading the question carefully.


Example 2 — Compute E(X) for a Payout Scenario

Using the same PMF from Example 1 (payouts −2, 0, 3, 5 with probabilities 0.20, 0.50, 0.20, 0.10), compute .

Prediction checkpoint: Before you see the full solution, predict: will be negative, zero, or positive? The largest probability (0.50) sits at , but there is also some probability on the positive values. Make a prediction, then read on.

Show Solution

The expected payout is $0.70 per play. Even though you have a 50% chance of winning nothing and a 20% chance of losing $2, the possibility of winning $3 or $5 tips the average to $0.70 in your favor.

Interpretation: If you played this game 1000 times, you would expect to come out ahead by about 700$ in total.


Example 3 — Compute Var(X) Step by Step

Using the same PMF (payouts −2, 0, 3, 5 with probabilities 0.20, 0.50, 0.20, 0.10), compute and .

Show Solution

Step 1: We already know from Example 2.

Step 2: Compute .

Step 3: Apply the shortcut formula.

Step 4: Compute the standard deviation.

Interpretation: Payouts in this game deviate from the expected $0.70 by about $2.15 on average. The game is favorable on average but quite volatile.


Example 4 — Use the CDF (Find the Error)

A student is using the PMF from Examples 1–3. They want to find and write:

Student’s work:

CDF values: , .

But wait — the student then says “so the answer is ,” and concludes that .

What error did the student make?

Show Solution

The CDF calculation is correct is right. But the student’s conclusion is wrong.

includes all outcomes strictly greater than 0 and at most 3. In this distribution, the only value in that range is (since is excluded by the strict inequality). So in this particular case, happens to be numerically correct.

The error is conceptual: the student stated it as a general fact that , which is false in general. For example, , which includes both (probability 0.50) and (probability 0.20) — it is not equal to .

The CDF formula accumulates all outcomes in the interval, not just the endpoint.

Section 5: Guided Practice

GP1 — Discrete or Continuous?

The number of phone calls a help-desk agent receives in a one-hour shift.

Is this random variable discrete or continuous? Explain your reasoning.

Choose the correct classification:

The height (in centimetres) of a randomly chosen student at your school.

Is this random variable discrete or continuous? Explain your reasoning.

Choose the correct classification:

The number of defective items in a quality-control sample of 50 products.

Is this random variable discrete or continuous? Explain your reasoning.

Choose the correct classification:

The time (in minutes) a customer waits in line at a coffee shop.

Is this random variable discrete or continuous? Explain your reasoning.

Choose the correct classification:

The number of siblings a randomly chosen person has.

Is this random variable discrete or continuous? Explain your reasoning.

Choose the correct classification:


GP2 — Is This PMF Valid?

A student claims the following is a valid PMF for a random variable :

1234
0.250.350.300.15

Is this a valid PMF? If not, identify the problem and fix it.

Evaluate this PMF:

Fix: Any adjustment that reduces the total by 0.05 while keeping all probabilities non-negative is valid — there is no unique correct answer. For example, setting gives a sum of 1.00, but reducing to 0.30 would also work.

A student claims the following is a valid PMF for a random variable :

0123
0.400.300.200.10

Is this a valid PMF? If not, identify the problem and fix it.

Evaluate this PMF:

A student claims the following is a valid PMF for a random variable :

123
0.500.60−0.10

Is this a valid PMF? If not, identify the problem and fix it.

Evaluate this PMF:

Fix: A negative probability is impossible. Restructure the distribution to have non-negative values summing to 1.

A student claims the following is a valid PMF for a random variable :

2468
0.150.350.350.15

Is this a valid PMF? If not, identify the problem and fix it.

Evaluate this PMF:

A student claims the following is a valid PMF for a random variable :

−1012
0.200.300.30?

The missing probability is given as .

Is this a valid PMF? If not, identify the problem and fix it.

Evaluate this PMF:

Fix: .


GP3 — Compute E(X)


GP4 — Compute Var(X) and σ_X

Section 6: Independent Practice

IP1 — Full E(X) + Var(X) Workflow


IP2 — CDF Practice

The random variable has the following PMF:

01234
0.050.200.450.250.05

(a) Build the CDF table for all values.

(b) Find .

(c) Find .

(d) Find .

Show Solution

(a) CDF:

00.05
10.25
20.70
30.95
41.00

(b) .

(c) .

(d) .

The random variable has the following PMF:

12345
0.100.250.300.250.10

(a) Build the CDF table for all values.

(b) Find .

(c) Find .

(d) Find .

Show Solution

(a) CDF:

10.10
20.35
30.65
40.90
51.00

(b) .

(c) . (Note: this equals because the distribution is symmetric.)

(d) .

The random variable has the following PMF:

0246
0.300.400.200.10

(a) Build the CDF table for all values.

(b) Find .

(c) Find .

(d) Find .

Show Solution

(a) CDF:

00.30
20.70
40.90
61.00

(b) .

(c) .

(d) . (This includes and .)

The random variable has the following PMF:

−2−1012
0.100.200.400.200.10

(a) Build the CDF table for all values.

(b) Find .

(c) Find .

(d) Find .

Show Solution

(a) CDF:

−20.10
−10.30
00.70
10.90
21.00

(b) .

(c) .

(d) . (Includes and .)

The random variable has the following PMF:

5101520
0.400.300.200.10

(a) Build the CDF table for all values.

(b) Find .

(c) Find .

(d) Find .

Show Solution

(a) CDF:

50.40
100.70
150.90
201.00

(b) .

(c) .

(d) . (Includes and .)


IP3 — Is the Game Fair?

These five variants explore whether a game is favorable, unfavorable, or break-even for the player. Most variants ask you to compute and interpret it. Variant 3 (the third card in the deck) is a find-the-error problem — the computation is done for you, and your job is to identify what went wrong.

Raffle scenario: A raffle has 100 tickets. First prize is $200, second prize is $50, and there is no third prize. All other tickets win nothing. One ticket costs $5.

Let = net winnings (prize minus ticket price) for one ticket.

−545195
98/1001/1001/100

Compute . Is it worth buying a ticket?

Show Solution

2.502.50 per ticket. It is not worth buying from a purely mathematical standpoint (though raffles often benefit a charity, which may be the real reason people buy tickets).

Card game: You pay $3 to draw one card from a standard deck of 52. If you draw an ace (4 total), you win $15. If you draw a face card (12 total: J, Q, K), you win $5. Otherwise, you win nothing.

Let = net winnings.

Compute . Is this game favorable for you?

Show Solution
Probability
(no ace, no face)36/52
(face card: win $5 − $3)12/52
(ace: win $15 − $3)4/52

0.69$. The game is unfavorable — you lose about 69 cents per play on average.

Find the Error — Insurance scenario: A student computes for a game where you pay $10 to enter and can win $0, $20, or $50 with probabilities 0.60, 0.30, and 0.10 respectively.

Student’s work:

Net winnings: −$10, +$10, +$40.

“Since 1 > 0$, this game is favorable for the player.”

The student then also checks: “The mode is (60% probability), so the most likely outcome is a loss. But because , I should play.”

Finally, the student writes: “The variance is .”

Identify every error in the student’s work.

Show Solution

Error 1 (Computational — minor): The net winnings for the $0 prize are 1020 are 1050 are 40E(X)E(X) = 1.0$.

Error 2 (Conceptual — major): The student correctly notes , so the conclusion to play is logically consistent — no error here.

Error 3 (Critical — formula error): The variance calculation is completely wrong. The student wrote . This confuses — the expected value of the squared variable — with — the square of the expected value. These are equal only when is a constant.

Correct variance:

The spread is enormous — even though the game is favorable on average, you can easily lose $10 or win $40 in a single play. The student’s vastly underestimates the risk.

Fair game? A game costs $2 to play. You roll a fair 4-sided die (faces 1–4). If you roll a 4, you win $6; if you roll a 3, you win $2; otherwise, you win nothing.

Compute for the net winnings and determine if the game is fair.

Show Solution
(net)Probability
(rolled 1 or 2)2/4 = 0.50
(rolled 3: won $2, paid $2)1/4 = 0.25
(rolled 4: won $6, paid $2)1/4 = 0.25

. This game is perfectly fair — on average, neither player nor house gains an advantage.

Coin flip bet: You and a friend flip a coin. If heads, your friend pays you $5. If tails, you pay your friend $5. But you also get a $1 bonus just for playing (paid by a third party).

Compute for your net winnings per flip and determine if this arrangement is favorable.

Show Solution
(net)Probability
(heads: $5 win + $1 bonus)0.50
(tails: −$5 loss + $1 bonus)0.50

11 per flip on average because of the bonus, even though the coin flip itself is fair.


IP4 — Find the Missing Probability, Then Compute E(X)


IP5 — Compute E(X) and σ_X, Then Interpret


IP6 — Multi-Step Synthesis

A quality-control engineer monitors a production line. Let = the number of defective items in a box of 5 products. Based on historical data, the engineer models with:

012345
0.600.200.100.060.030.01

(a) Verify that this is a valid PMF.

(b) Compute and interpret it in the context of production quality.

(c) Compute and using the shortcut formula.

(d) The company policy flags a box for reinspection if . What is the probability a box is flagged? Use the CDF.

Show Solution

(a) Sum of probabilities: ✓. All non-negative ✓. Valid PMF.

(b)

Interpretation: On average, there are 0.75 defective items per box. In the long run, the engineer expects about 75 defective items per 100 boxes.

(c)

The number of defects per box typically varies from the mean of 0.75 by about 1.14 defects.

(d)

There is a 20% chance a given box is flagged for reinspection.


Mixed Review — Retrieval from Earlier Lessons

These problems draw on concepts from earlier in the course. Attempting them without re-reading prior lessons is the point — retrieval practice strengthens long-term memory more than re-reading.

Review Problem 1 — Conditional probability from a table

A company surveys 200 employees on their transport mode and whether they are late at least once per month.

Late (≥1×/month)Never lateTotal
Drives3090120
Takes transit404080
Total70130200

(a) Compute and .

(b) Are being late and driving independent? Apply the formal independence test.

Show Solution

(a)

These are not equal — swapping the conditioning direction changes the answer.

(b) Independence test: .

.

Since , the events are dependent — driving employees are less likely to be late than the overall rate would suggest.


Review Problem 2 — Combinations in a probability calculation

A quality inspector randomly selects 3 items from a shipment of 12 products, of which 4 are defective.

(a) How many ways can the inspector select 3 items from 12?

(b) What is the probability that all 3 selected items are defective?

Show Solution

(a) Order does not matter when drawing a sample. Total ways = .

(b) Ways to choose 3 defectives from the 4 defective items: .

There is less than a 2% chance the inspector draws three defective items.

Section 7: Mastery Check

Question 1 — Feynman Test

Explain, in your own words and without formulas, why the expected value is not necessarily the most likely value (mode) of a random variable. Use a concrete example to support your explanation.

0 / 500
Model Answer

The expected value is a weighted average — it accounts for both the probability and the size of each outcome. The mode is simply the outcome with the highest probability, regardless of how large or small that outcome is.

Example: Imagine a lottery where you pay $1 and win $1,000 with probability 0.001 or $0 (lose your $1) with probability 0.999. The mode is −$1 because it happens 99.9% of the time. But .

The expected value is pulled up by the large prize even though you almost always lose. It is the balance point of the distribution — the location where it would balance if each outcome had weight equal to its probability. The mode ignores the magnitude of outcomes entirely.

Classic example from class: A fair die has (not an outcome at all), while the mode is undefined (all faces equally likely). The expected value summarizes the distribution’s center of gravity, not its peak.


Question 2 — Apply: Medical Testing Revenue

A hospital laboratory charges fees for diagnostic tests. For a randomly chosen test order, the number of individual tests performed, , has the following distribution:

1234
0.400.300.200.10

Each test costs the patient $80. Let be the revenue per test order.

What is the most efficient approach to finding the expected revenue per test order?

Compute E(X) and then E(R):

Show Full Solution

For variance, since , the property applies.

The hospital earns an average of $160 per test order, with a standard deviation of $80.


Question 3 — Error Analysis

A student computes for the following PMF:

246
1/31/31/3
Student’s work:

“The variance is 16.”

What is the student’s error?

Full Correction

The correct variance:

The student’s error was writing . The correct formula is , where requires squaring each value before multiplying by its probability. The student never computed at all.


Self-Assessment

How confident do you feel about expected value and variance?

Still confusedReady for the Boss Fight

Section 8: Boss Fight

Choose your challenge path. Both paths require the full toolkit from this lesson — PMF, , , and interpretation.

🎡 Path A — The Analyst

Evaluate a carnival game bet. Is it profitable for the player in the long run? Use E(X) and σ_X to give a complete recommendation.

🏗️ Path B — The Architect

Design a four-outcome PMF for a payout scheme that satisfies strict mathematical constraints. Build and verify your own distribution.

🎡 Path A — The Analyst

A carnival game costs $5 to play. You throw a ring onto a set of bottles. The outcomes and probabilities are:

OutcomePrizeProbability
Ring on a red bottle$200.05
Ring on a blue bottle$100.15
Ring on a yellow bottle$30.30
Miss entirely$00.50

Let = net winnings per play (prize minus $5 entry fee).

Task 1. Construct the PMF for (net winnings). Write out the full table including all four outcomes with their net values and probabilities. Verify the PMF is valid.


Task 2. Compute . Show every step of the weighted sum. Then answer: Is this game favorable, unfavorable, or fair for a player? How much does the player expect to win or lose per play?


Task 3. Compute using the shortcut formula, then compute . Round to 2 decimal places.


Task 4. A friend says, “The most common outcome is missing (50% chance), so this game is obviously a bad deal.” Write a 2–3 sentence response that uses both and to give a more complete analysis. Does the variance change your recommendation?

0 / 500

🏗️ Path B — The Architect

Design a PMF for a payout scheme (e.g., a custom lottery or insurance product) with exactly 4 outcomes satisfying these constraints:

  • All probabilities are non-negative and sum to 1.
  • (the expected payout is $2 per play).
  • (the distribution is not too spread out).
  • Outcomes are positive integers (payouts in dollars).

Task 1. Choose your four outcome values (positive integers). Write down the equations you need your probabilities to satisfy: the PMF validity equation and the constraint. Explain your strategy for choosing outcomes and probabilities.


Task 2. Assign probabilities and verify: (a) all non-negative, (b) sum to 1, (c) . Show your arithmetic.


Task 3. Compute , then compute and verify . If your first attempt violates the variance constraint, adjust your design and re-verify.


Task 4. Suppose this PMF represents the daily payout (in $) of a new financial product. Write 2–3 sentences explaining what and your value mean to a potential investor who does not know probability theory.

0 / 500

Section 9: Challenge Problems

Ready for more? These go beyond the lesson objectives.

Challenge 1 — Derive the Shortcut Formula

The definition of variance is:

where . Starting from this definition, derive the shortcut formula by algebraic expansion.

Hint: Expand as a polynomial, then apply linearity of expectation ( for constants , ).

Show Derivation

Expand the square:

Apply linearity of expectation (, and for constant ):

Substitute :

Since :

This derivation also explains why always: since variance is defined as an expected value of a squared quantity , it can never be negative, which forces .

Bonus: What does mean? By definition, . Since always, this expectation equals zero only if with probability 1, meaning with probability 1. A random variable with is a constant — it always takes the same value.


Challenge 2 — Regenerable Stretch Problem

This is the same generator as IP1 but used here as a stretch problem. Try to complete it with no scaffolding and verify your answer using the shortcut and definition formulas.

Section 10: Solutions Reference

View Full Solutions Page →

Strategy advice for E(X) and Var(X) problems:

  • Always verify the PMF first (non-negative, sum = 1) before computing anything.
  • Use a table layout: column for , column for , column for , column for . Sum the last two columns to get and in one pass.
  • Never skip computing separately — computing instead is the most common error in this chapter (Pitfall P2).
  • A negative variance is always an arithmetic error. Stop and recheck before continuing.
  • For CDF problems, build the full CDF table once, then read off any probability you need.

Quick-Reference Formulas

FormulaMeaning
PMF validity — probabilities must sum to 1
Expected value — the long-run average
Second moment — needed for the variance shortcut
Variance — spread around the expected value
Standard deviation — spread in original units
CDF — cumulative probability up to
“At least ” via the complement
”Between and ” via CDF subtraction

Step-by-Step Computation Workflow

StepActionFormula
1Verify the PMFCheck all and
2Compute
3Compute
4Compute Var(X)
5Compute
6Build CDF if neededAccumulate PMF values from left to right