Random Variables

Essential
Last updated: Tags: Probability, Random Variables

In a probability model, outcomes live in an abstract sample space Ω\Omega that may carry no arithmetic structure at all. A random variable is the bridge that lets you ask numerical questions — “what is the average gain?”, “how often does the count exceed 10?” — by mapping Ω\Omega into R\mathbb{R} in a way that is compatible with the probability measure. Getting that compatibility right is exactly the job of measurability.

Formal definition: random variable as a measurable function

Let (Ω,F,P)(\Omega, \mathcal{F}, P) be a probability space and let (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})) be the real line equipped with its Borel σ\sigma-algebra B(R)\mathcal{B}(\mathbb{R}) — the σ\sigma-algebra generated by all open intervals.

Definition. A random variable is a function

X ⁣:ΩRX \colon \Omega \to \mathbb{R}

that is (F,B(R))(\mathcal{F}, \mathcal{B}(\mathbb{R}))-measurable, meaning that for every Borel set BB(R)B \in \mathcal{B}(\mathbb{R}),

X1(B){ωΩ:X(ω)B}F.X^{-1}(B) \coloneqq \{\omega \in \Omega : X(\omega) \in B\} \in \mathcal{F}.

Why measurability matters

The probability measure PP is only defined on events in F\mathcal{F}. When you write P(XB)P(X \in B) you are really writing P(X1(B))P(X^{-1}(B)). If X1(B)X^{-1}(B) were not in F\mathcal{F}, the expression P(XB)P(X \in B) would be undefined. Measurability is exactly and only the condition that guarantees the preimage of every Borel set is an event, so every statement of the form ”XX takes a value in BB” has a well-defined probability.

Because B(R)\mathcal{B}(\mathbb{R}) is generated by the rays (,x](-\infty, x], it is enough to check measurability on those generators:

X is measurable    {Xx}F for every xR.X \text{ is measurable} \iff \{X \leq x\} \in \mathcal{F} \text{ for every } x \in \mathbb{R}.

The push-forward measure and the distribution of XX

Measurability lets you push PP forward from Ω\Omega to R\mathbb{R}.

Definition. The distribution (or law) of XX is the probability measure PXP_X on (R,B(R))(\mathbb{R}, \mathcal{B}(\mathbb{R})) defined by

PX(B)P(X1(B))=P(XB),BB(R).P_X(B) \coloneqq P(X^{-1}(B)) = P(X \in B), \quad B \in \mathcal{B}(\mathbb{R}).

You can verify that PXP_X is indeed a probability measure: PX(R)=P(Ω)=1P_X(\mathbb{R}) = P(\Omega) = 1, and countable additivity follows from that of PP and the fact that preimages preserve set operations.

The distribution PXP_X encodes everything about XX that is probabilistically observable. Two random variables defined on entirely different probability spaces but sharing the same distribution are equal in law (written X=dYX \overset{d}{=} Y) and are indistinguishable by any probabilistic statement.

Cumulative distribution function

The cumulative distribution function (CDF) of XX is the real function

FX(x)P(Xx)=PX((,x]),xR.F_X(x) \coloneqq P(X \leq x) = P_X\bigl((-\infty, x]\bigr), \quad x \in \mathbb{R}.

Every CDF satisfies three properties:

  1. Non-decreasing. If xyx \leq y then {Xx}{Xy}\{X \leq x\} \subseteq \{X \leq y\}, so FX(x)FX(y)F_X(x) \leq F_X(y).
  2. Right-continuous with left-hand limits (càdlàg). FX(x)=limtxFX(t)F_X(x) = \lim_{t \downarrow x} F_X(t).
  3. Boundary behaviour. limxFX(x)=0,limx+FX(x)=1.\lim_{x \to -\infty} F_X(x) = 0, \qquad \lim_{x \to +\infty} F_X(x) = 1.

Conversely, any function satisfying these three properties is the CDF of some random variable. The CDF uniquely determines the distribution PXP_X: every Borel probability measure on R\mathbb{R} corresponds to a unique CDF, and vice versa.

Discrete and absolutely continuous random variables

The Lebesgue decomposition theorem classifies how PXP_X sits relative to Lebesgue measure λ\lambda on R\mathbb{R}. The two most important special cases are:

Discrete random variables

XX is discrete if there is a countable set S={x1,x2,}RS = \{x_1, x_2, \ldots\} \subseteq \mathbb{R} such that PX(S)=1P_X(S) = 1. The distribution is a sum of point masses and PXλP_X \perp \lambda (singular with respect to Lebesgue measure).

Absolutely continuous random variables

XX is absolutely continuous if PXλP_X \ll \lambda, i.e. PX(B)=0P_X(B) = 0 whenever λ(B)=0\lambda(B) = 0. By the Radon–Nikodym theorem there then exists a non-negative measurable function ff such that

PX(B)=Bf(t)dλ(t),BB(R).P_X(B) = \int_B f(t)\, d\lambda(t), \quad B \in \mathcal{B}(\mathbb{R}).

In practice most real-world distributions are either discrete, absolutely continuous, or a finite mixture of both (the mixed type). Singular-continuous distributions (e.g. Cantor distribution) exist but are rarely encountered in applications.

Probability mass function for discrete random variables

For a discrete random variable with countable support {x1,x2,}\{x_1, x_2, \ldots\}, the probability mass function (PMF) is

pkP(X=xk),k=1,2,p_k \coloneqq P(X = x_k), \quad k = 1, 2, \ldots

It satisfies pk0p_k \geq 0 for all kk and

kpk=1.\sum_k p_k = 1.

The CDF is a staircase:

FX(x)=k:xkxpk.F_X(x) = \sum_{k : x_k \leq x} p_k.

The PMF completely determines the distribution of XX.

Probability density function for absolutely continuous random variables

For an absolutely continuous random variable, the Radon–Nikodym derivative fdPX/dλf \coloneqq dP_X / d\lambda is called the probability density function (PDF). It satisfies:

  • f(x)0f(x) \geq 0 for λ\lambda-almost every xx.
  • +f(t)dt=1\displaystyle\int_{-\infty}^{+\infty} f(t)\, dt = 1.

The CDF is recovered by integration:

FX(x)=xf(t)dt,F_X(x) = \int_{-\infty}^{x} f(t)\, dt,

and whenever ff is continuous at xx we have FX(x)=f(x)F_X'(x) = f(x).

The probability of any interval is

P(a<Xb)=abf(t)dt.P(a < X \leq b) = \int_a^b f(t)\, dt.

Note that for an absolutely continuous random variable, P(X=x)=0P(X = x) = 0 for every individual point xx. Probability concentrates on intervals, not points.

Expectation

The expectation (or expected value) of XX is the Lebesgue integral of XX against PP:

E[X]ΩX(ω)dP(ω)=RxdPX(x),E[X] \coloneqq \int_\Omega X(\omega)\, dP(\omega) = \int_{\mathbb{R}} x\, dP_X(x),

where the second equality is the change-of-variables formula for push-forward measures. The two specialisations are:

E[X]=kxkpk(discrete),E[X] = \sum_k x_k\, p_k \qquad \text{(discrete)},

E[X]=+xf(x)dx(absolutely continuous),E[X] = \int_{-\infty}^{+\infty} x\, f(x)\, dx \qquad \text{(absolutely continuous)},

provided the sum or integral converges absolutely.

Summary

  • A random variable X:ΩRX : \Omega \to \mathbb{R} is a measurable function; measurability ensures P(XB)P(X \in B) is defined for every Borel set BB.
  • The distribution PX(B)=P(X1(B))P_X(B) = P(X^{-1}(B)) is the push-forward of PP onto R\mathbb{R}; it carries all probabilistic information about XX.
  • The CDF FX(x)=P(Xx)F_X(x) = P(X \leq x) is non-decreasing, right-continuous, with limits 00 and 11 at ±\pm\infty; it uniquely determines PXP_X.
  • Discrete random variables have a PMF pk=P(X=xk)p_k = P(X = x_k) summing to 11; absolutely continuous random variables have a PDF ff integrating to 11 with FX(x)=xf(t)dtF_X(x) = \int_{-\infty}^x f(t)\, dt.
  • Expectation is the Lebesgue integral E[X]=xdPX(x)E[X] = \int x\, dP_X(x), specialising to xkpk\sum x_k p_k or xf(x)dx\int x f(x)\, dx.