The Probability Axioms
EssentialPrerequisites
Now that you have the vocabulary of sample spaces and events, the next step is to assign numbers to events in a coherent way. Kolmogorov’s 1933 axioms do exactly this: they are the minimal conditions that make probability a useful calculus of uncertainty.
The three axioms
Definition. A probability space is a triple where is the sample space, is a σ-algebra of events on , and is a function satisfying:
- Non-negativity. for all .
- Normalisation. .
- Countable additivity (-additivity). For any sequence of pairwise disjoint events ,
Such a function is called a probability measure. Notice that a probability space is simply a measure space in which the measure has total mass : . Every theorem about measures applies directly to probability.
Immediate consequences
The three axioms alone imply a rich set of properties.
Probability of the impossible event
Taking the empty disjoint union with for all :
which forces .
Finite additivity
For two disjoint events and , set , , for . Then -additivity gives .
Complement rule
Since and are disjoint and :
Monotonicity
If , write as a disjoint union. Then
In particular for every event .
Inclusion–exclusion
For two events:
Proof. Write disjointly, so . Write disjointly, so . Combine.
The generalisation to events is the inclusion–exclusion principle:
Union bound (Boole’s inequality)
A simpler upper bound drops the intersection terms:
This follows from inclusion–exclusion by discarding the negative terms. It is invaluable in asymptotic arguments: if you can show , you know as well.
Continuity of probability
Probability is continuous with respect to monotone limits of events, just as the Lebesgue integral is continuous with respect to monotone limits of functions.
Continuity from below
If is an increasing sequence of events with , then
Proof. Write as a disjoint union of “rings”. By -additivity:
Continuity from above
If is a decreasing sequence with , then
provided (which here is automatic since ). The proof applies continuity from below to the complements.
These continuity properties are essential for proving theorems about limits: the Borel–Cantelli lemmas, the strong law of large numbers, and many results in stochastic processes all rely on them.
Probability is measure theory
A probability measure is a measure with total mass . This means the entire machinery of the Lebesgue integral — linearity, monotone convergence, dominated convergence, Fatou’s lemma, Fubini’s theorem — carries over intact. When you write , the integral is the Lebesgue integral you already know, applied to the measure .
The only genuinely new feature is the normalisation , which enables probabilistic interpretations: is the long-run fraction of experiments in which event occurs (frequentist), or your degree of belief that will occur (Bayesian). The mathematics is the same in either case.
Summary
- A probability space consists of a sample space, a -algebra, and a probability measure satisfying the three Kolmogorov axioms: non-negativity, normalisation (), and -additivity.
- Immediate consequences: ; ; monotonicity (); inclusion–exclusion; the union bound.
- Continuity from below and above: is continuous with respect to monotone limits of events, which is -additivity applied to nested sequences.
- A probability measure is just a measure with total mass : all Lebesgue integration theorems hold for expectations out of the box.