Random Variables — Project Hematite

In a probability model, outcomes live in an abstract sample space $\Omega$ that may carry no arithmetic structure at all. A random variable is the bridge that lets you ask numerical questions — “what is the average gain?”, “how often does the count exceed 10?” — by mapping $\Omega$ into $\mathbb{R}$ in a way that is compatible with the probability measure. Getting that compatibility right is exactly the job of measurability.

Formal definition: random variable as a measurable function

Let $(\Omega, \mathcal{F}, P)$ be a probability space and let $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ be the real line equipped with its Borel $\sigma$ -algebra $\mathcal{B}(\mathbb{R})$ — the $\sigma$ -algebra generated by all open intervals.

Definition. A random variable is a function

$X \colon \Omega \to \mathbb{R}$

that is $(\mathcal{F}, \mathcal{B}(\mathbb{R}))$ -measurable, meaning that for every Borel set $B \in \mathcal{B}(\mathbb{R})$ ,

$X^{-1}(B) \coloneqq \{\omega \in \Omega : X(\omega) \in B\} \in \mathcal{F}.$

Why measurability matters

The probability measure $P$ is only defined on events in $\mathcal{F}$ . When you write $P(X \in B)$ you are really writing $P(X^{-1}(B))$ . If $X^{-1}(B)$ were not in $\mathcal{F}$ , the expression $P(X \in B)$ would be undefined. Measurability is exactly and only the condition that guarantees the preimage of every Borel set is an event, so every statement of the form ” $X$ takes a value in $B$ ” has a well-defined probability.

Because $\mathcal{B}(\mathbb{R})$ is generated by the rays $(-\infty, x]$ , it is enough to check measurability on those generators:

$X \text{ is measurable} \iff \{X \leq x\} \in \mathcal{F} \text{ for every } x \in \mathbb{R}.$

The push-forward measure and the distribution of $X$

Measurability lets you push $P$ forward from $\Omega$ to $\mathbb{R}$ .

Definition. The distribution (or law) of $X$ is the probability measure $P_X$ on $(\mathbb{R}, \mathcal{B}(\mathbb{R}))$ defined by

$P_X(B) \coloneqq P(X^{-1}(B)) = P(X \in B), \quad B \in \mathcal{B}(\mathbb{R}).$

You can verify that $P_X$ is indeed a probability measure: $P_X(\mathbb{R}) = P(\Omega) = 1$ , and countable additivity follows from that of $P$ and the fact that preimages preserve set operations.

The distribution $P_X$ encodes everything about $X$ that is probabilistically observable. Two random variables defined on entirely different probability spaces but sharing the same distribution are equal in law (written $X \overset{d}{=} Y$ ) and are indistinguishable by any probabilistic statement.

Cumulative distribution function

The cumulative distribution function (CDF) of $X$ is the real function

$F_X(x) \coloneqq P(X \leq x) = P_X\bigl((-\infty, x]\bigr), \quad x \in \mathbb{R}.$

Every CDF satisfies three properties:

Non-decreasing. If $x \leq y$ then $\{X \leq x\} \subseteq \{X \leq y\}$ , so $F_X(x) \leq F_X(y)$ .
Right-continuous with left-hand limits (càdlàg). $F_X(x) = \lim_{t \downarrow x} F_X(t)$ .
Boundary behaviour. $\lim_{x \to -\infty} F_X(x) = 0, \qquad \lim_{x \to +\infty} F_X(x) = 1.$

Conversely, any function satisfying these three properties is the CDF of some random variable. The CDF uniquely determines the distribution $P_X$ : every Borel probability measure on $\mathbb{R}$ corresponds to a unique CDF, and vice versa.

Discrete and absolutely continuous random variables

The Lebesgue decomposition theorem classifies how $P_X$ sits relative to Lebesgue measure $\lambda$ on $\mathbb{R}$ . The two most important special cases are:

Discrete random variables

$X$ is discrete if there is a countable set $S = \{x_1, x_2, \ldots\} \subseteq \mathbb{R}$ such that $P_X(S) = 1$ . The distribution is a sum of point masses and $P_X \perp \lambda$ (singular with respect to Lebesgue measure).

Absolutely continuous random variables

$X$ is absolutely continuous if $P_X \ll \lambda$ , i.e. $P_X(B) = 0$ whenever $\lambda(B) = 0$ . By the Radon–Nikodym theorem there then exists a non-negative measurable function $f$ such that

$P_X(B) = \int_B f(t)\, d\lambda(t), \quad B \in \mathcal{B}(\mathbb{R}).$

In practice most real-world distributions are either discrete, absolutely continuous, or a finite mixture of both (the mixed type). Singular-continuous distributions (e.g. Cantor distribution) exist but are rarely encountered in applications.

Probability mass function for discrete random variables

For a discrete random variable with countable support $\{x_1, x_2, \ldots\}$ , the probability mass function (PMF) is

$p_k \coloneqq P(X = x_k), \quad k = 1, 2, \ldots$

It satisfies $p_k \geq 0$ for all $k$ and

$\sum_k p_k = 1.$

The CDF is a staircase:

$F_X(x) = \sum_{k : x_k \leq x} p_k.$

The PMF completely determines the distribution of $X$ .

Probability density function for absolutely continuous random variables

For an absolutely continuous random variable, the Radon–Nikodym derivative $f \coloneqq dP_X / d\lambda$ is called the probability density function (PDF). It satisfies:

$f(x) \geq 0$ for $\lambda$ -almost every $x$ .
$\displaystyle\int_{-\infty}^{+\infty} f(t)\, dt = 1$ .

The CDF is recovered by integration:

$F_X(x) = \int_{-\infty}^{x} f(t)\, dt,$

and whenever $f$ is continuous at $x$ we have $F_X'(x) = f(x)$ .

The probability of any interval is

$P(a < X \leq b) = \int_a^b f(t)\, dt.$

Note that for an absolutely continuous random variable, $P(X = x) = 0$ for every individual point $x$ . Probability concentrates on intervals, not points.

Expectation

The expectation (or expected value) of $X$ is the Lebesgue integral of $X$ against $P$ :

$E[X] \coloneqq \int_\Omega X(\omega)\, dP(\omega) = \int_{\mathbb{R}} x\, dP_X(x),$

where the second equality is the change-of-variables formula for push-forward measures. The two specialisations are:

$E[X] = \sum_k x_k\, p_k \qquad \text{(discrete)},$

$E[X] = \int_{-\infty}^{+\infty} x\, f(x)\, dx \qquad \text{(absolutely continuous)},$

provided the sum or integral converges absolutely.

Summary

A random variable $X : \Omega \to \mathbb{R}$ is a measurable function; measurability ensures $P(X \in B)$ is defined for every Borel set $B$ .
The distribution $P_X(B) = P(X^{-1}(B))$ is the push-forward of $P$ onto $\mathbb{R}$ ; it carries all probabilistic information about $X$ .
The CDF $F_X(x) = P(X \leq x)$ is non-decreasing, right-continuous, with limits $0$ and $1$ at $\pm\infty$ ; it uniquely determines $P_X$ .
Discrete random variables have a PMF $p_k = P(X = x_k)$ summing to $1$ ; absolutely continuous random variables have a PDF $f$ integrating to $1$ with $F_X(x) = \int_{-\infty}^x f(t)\, dt$ .
Expectation is the Lebesgue integral $E[X] = \int x\, dP_X(x)$ , specialising to $\sum x_k p_k$ or $\int x f(x)\, dx$ .