Conditional Probability

Essential
Last updated: Tags: Probability, Conditional Probability

Prerequisites

When you learn that a randomly chosen patient is a smoker, the probability they develop lung cancer rises dramatically from the population baseline. Conditional probability is the mathematical tool for updating a probability when you receive partial information about which outcome occurred.

Definition

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space and BFB \in \mathcal{F} an event with P(B)>0P(B) > 0. The conditional probability of AA given BB is

P(AB)P(AB)P(B).P(A \mid B) \coloneqq \frac{P(A \cap B)}{P(B)}.

The intuition: conditioning on BB restricts the sample space from Ω\Omega to BB. Within BB, the relevant probability of AA is the fraction of BB‘s mass that falls in AA, renormalised so that P(BB)=1P(B \mid B) = 1.

Renormalisation interpretation

For fixed BB with P(B)>0P(B) > 0, the function Q(A)P(AB)Q(A) \coloneqq P(A \mid B) is itself a probability measure on (Ω,F)(\Omega, \mathcal{F}):

  • Non-negativity. P(AB)0P(A \cap B) \geq 0 and P(B)>0P(B) > 0, so Q(A)0Q(A) \geq 0.
  • Normalisation. Q(Ω)=P(ΩB)/P(B)=1Q(\Omega) = P(\Omega \cap B) / P(B) = 1.
  • σ\sigma-additivity. Follows directly from that of PP.

So every theorem about probability measures applies to conditional probabilities with the same conditioning event — you have simply replaced the original measure with a new one concentrated on BB.

Multiplication rule

Rearranging the definition immediately gives the multiplication rule:

P(AB)=P(AB)P(B).(1)P(A \cap B) = P(A \mid B) \, P(B). \tag{1}

This extends to chains of events. For events A1,A2,,AnA_1, A_2, \ldots, A_n with P(A1An1)>0P(A_1 \cap \cdots \cap A_{n-1}) > 0:

P(A1An)=P(A1)P(A2A1)P(A3A1A2)P(AnA1An1).(2)P(A_1 \cap \cdots \cap A_n) = P(A_1) \, P(A_2 \mid A_1) \, P(A_3 \mid A_1 \cap A_2) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1}). \tag{2}

Proof. Apply the multiplication rule iteratively: P(A1An)=P(AnA1An1)P(A1An1)P(A_1 \cap \cdots \cap A_n) = P(A_n \mid A_1 \cap \cdots \cap A_{n-1}) \cdot P(A_1 \cap \cdots \cap A_{n-1}), then recurse.

Example. Draw cards without replacement from a standard 52-card deck. The probability that the first two draws are both aces is

P(ace1ace2)=P(ace1)P(ace2ace1)=452351=1221.P(\text{ace}_1 \cap \text{ace}_2) = P(\text{ace}_1) \cdot P(\text{ace}_2 \mid \text{ace}_1) = \frac{4}{52} \cdot \frac{3}{51} = \frac{1}{221}.

Law of total probability

A partition of Ω\Omega is a collection {Bi}iI\{B_i\}_{i \in I} of pairwise disjoint events whose union is Ω\Omega. If {B1,B2,}\{B_1, B_2, \ldots\} is a countable partition with each P(Bi)>0P(B_i) > 0, then for any event AA:

P(A)=iP(ABi)P(Bi).(3)P(A) = \sum_{i} P(A \mid B_i) \, P(B_i). \tag{3}

Proof. Write A=i(ABi)A = \bigcup_i (A \cap B_i) as a disjoint union. By σ\sigma-additivity and the multiplication rule:

P(A)=iP(ABi)=iP(ABi)P(Bi).P(A) = \sum_i P(A \cap B_i) = \sum_i P(A \mid B_i) \, P(B_i).

The law of total probability is the workhorse for computing “hard” unconditional probabilities: choose a partition on which P(ABi)P(A \mid B_i) is easy to evaluate, then combine using the prior weights P(Bi)P(B_i).

Example. A factory has two machines. Machine 1 produces 60% of output with a 2% defect rate; Machine 2 produces 40% with a 5% defect rate. Let DD be the event that a randomly chosen product is defective, and M1M_1, M2M_2 the events that it came from each machine. Then

P(D)=P(DM1)P(M1)+P(DM2)P(M2)=0.02×0.6+0.05×0.4=0.032.P(D) = P(D \mid M_1) \, P(M_1) + P(D \mid M_2) \, P(M_2) = 0.02 \times 0.6 + 0.05 \times 0.4 = 0.032.

So 3.2% of all products are defective.

Summary

  • Conditional probability: P(AB)=P(AB)/P(B)P(A \mid B) = P(A \cap B) / P(B) for P(B)>0P(B) > 0; it is the renormalised probability on the restricted sample space BB, and itself satisfies the three Kolmogorov axioms.
  • Multiplication rule: P(AB)=P(AB)P(B)P(A \cap B) = P(A \mid B) P(B), extending to chains of events via P(A1An)=P(A1)P(A2A1)P(AnA1An1)P(A_1 \cap \cdots \cap A_n) = P(A_1) P(A_2 \mid A_1) \cdots P(A_n \mid A_1 \cap \cdots \cap A_{n-1}).
  • Law of total probability: for a countable partition {Bi}\{B_i\} of Ω\Omega with P(Bi)>0P(B_i) > 0, P(A)=iP(ABi)P(Bi)P(A) = \sum_i P(A \mid B_i) P(B_i).