When you learn that a randomly chosen patient is a smoker, the probability they develop lung cancer rises dramatically from the population baseline. Conditional probability is the mathematical tool for updating a probability when you receive partial information about which outcome occurred.
Definition
Let (Ω,F,P) be a probability space and B∈F an event with P(B)>0. The conditional probability of A given B is
P(A∣B):=P(B)P(A∩B).
The intuition: conditioning on B restricts the sample space from Ω to B. Within B, the relevant probability of A is the fraction of B‘s mass that falls in A, renormalised so that P(B∣B)=1.
Renormalisation interpretation
For fixed B with P(B)>0, the function Q(A):=P(A∣B) is itself a probability measure on (Ω,F):
- Non-negativity. P(A∩B)≥0 and P(B)>0, so Q(A)≥0.
- Normalisation. Q(Ω)=P(Ω∩B)/P(B)=1.
- σ-additivity. Follows directly from that of P.
So every theorem about probability measures applies to conditional probabilities with the same conditioning event — you have simply replaced the original measure with a new one concentrated on B.
Multiplication rule
Rearranging the definition immediately gives the multiplication rule:
P(A∩B)=P(A∣B)P(B).(1)
This extends to chains of events. For events A1,A2,…,An with P(A1∩⋯∩An−1)>0:
P(A1∩⋯∩An)=P(A1)P(A2∣A1)P(A3∣A1∩A2)⋯P(An∣A1∩⋯∩An−1).(2)
Proof. Apply the multiplication rule iteratively: P(A1∩⋯∩An)=P(An∣A1∩⋯∩An−1)⋅P(A1∩⋯∩An−1), then recurse.
Example. Draw cards without replacement from a standard 52-card deck. The probability that the first two draws are both aces is
P(ace1∩ace2)=P(ace1)⋅P(ace2∣ace1)=524⋅513=2211.
Law of total probability
A partition of Ω is a collection {Bi}i∈I of pairwise disjoint events whose union is Ω. If {B1,B2,…} is a countable partition with each P(Bi)>0, then for any event A:
P(A)=i∑P(A∣Bi)P(Bi).(3)
Proof. Write A=⋃i(A∩Bi) as a disjoint union. By σ-additivity and the multiplication rule:
P(A)=i∑P(A∩Bi)=i∑P(A∣Bi)P(Bi).
The law of total probability is the workhorse for computing “hard” unconditional probabilities: choose a partition on which P(A∣Bi) is easy to evaluate, then combine using the prior weights P(Bi).
Example. A factory has two machines. Machine 1 produces 60% of output with a 2% defect rate; Machine 2 produces 40% with a 5% defect rate. Let D be the event that a randomly chosen product is defective, and M1, M2 the events that it came from each machine. Then
P(D)=P(D∣M1)P(M1)+P(D∣M2)P(M2)=0.02×0.6+0.05×0.4=0.032.
So 3.2% of all products are defective.
Summary
- Conditional probability: P(A∣B)=P(A∩B)/P(B) for P(B)>0; it is the renormalised probability on the restricted sample space B, and itself satisfies the three Kolmogorov axioms.
- Multiplication rule: P(A∩B)=P(A∣B)P(B), extending to chains of events via P(A1∩⋯∩An)=P(A1)P(A2∣A1)⋯P(An∣A1∩⋯∩An−1).
- Law of total probability: for a countable partition {Bi} of Ω with P(Bi)>0, P(A)=∑iP(A∣Bi)P(Bi).