Random Variables
EssentialPrerequisites
In a probability model, outcomes live in an abstract sample space that may carry no arithmetic structure at all. A random variable is the bridge that lets you ask numerical questions — “what is the average gain?”, “how often does the count exceed 10?” — by mapping into in a way that is compatible with the probability measure. Getting that compatibility right is exactly the job of measurability.
Formal definition: random variable as a measurable function
Let be a probability space and let be the real line equipped with its Borel -algebra — the -algebra generated by all open intervals.
Definition. A random variable is a function
that is -measurable, meaning that for every Borel set ,
Why measurability matters
The probability measure is only defined on events in . When you write you are really writing . If were not in , the expression would be undefined. Measurability is exactly and only the condition that guarantees the preimage of every Borel set is an event, so every statement of the form ” takes a value in ” has a well-defined probability.
Because is generated by the rays , it is enough to check measurability on those generators:
The push-forward measure and the distribution of
Measurability lets you push forward from to .
Definition. The distribution (or law) of is the probability measure on defined by
You can verify that is indeed a probability measure: , and countable additivity follows from that of and the fact that preimages preserve set operations.
The distribution encodes everything about that is probabilistically observable. Two random variables defined on entirely different probability spaces but sharing the same distribution are equal in law (written ) and are indistinguishable by any probabilistic statement.
Cumulative distribution function
The cumulative distribution function (CDF) of is the real function
Every CDF satisfies three properties:
- Non-decreasing. If then , so .
- Right-continuous with left-hand limits (càdlàg). .
- Boundary behaviour.
Conversely, any function satisfying these three properties is the CDF of some random variable. The CDF uniquely determines the distribution : every Borel probability measure on corresponds to a unique CDF, and vice versa.
Discrete and absolutely continuous random variables
The Lebesgue decomposition theorem classifies how sits relative to Lebesgue measure on . The two most important special cases are:
Discrete random variables
is discrete if there is a countable set such that . The distribution is a sum of point masses and (singular with respect to Lebesgue measure).
Absolutely continuous random variables
is absolutely continuous if , i.e. whenever . By the Radon–Nikodym theorem there then exists a non-negative measurable function such that
In practice most real-world distributions are either discrete, absolutely continuous, or a finite mixture of both (the mixed type). Singular-continuous distributions (e.g. Cantor distribution) exist but are rarely encountered in applications.
Probability mass function for discrete random variables
For a discrete random variable with countable support , the probability mass function (PMF) is
It satisfies for all and
The CDF is a staircase:
The PMF completely determines the distribution of .
Probability density function for absolutely continuous random variables
For an absolutely continuous random variable, the Radon–Nikodym derivative is called the probability density function (PDF). It satisfies:
- for -almost every .
- .
The CDF is recovered by integration:
and whenever is continuous at we have .
The probability of any interval is
Note that for an absolutely continuous random variable, for every individual point . Probability concentrates on intervals, not points.
Expectation
The expectation (or expected value) of is the Lebesgue integral of against :
where the second equality is the change-of-variables formula for push-forward measures. The two specialisations are:
provided the sum or integral converges absolutely.
Summary
- A random variable is a measurable function; measurability ensures is defined for every Borel set .
- The distribution is the push-forward of onto ; it carries all probabilistic information about .
- The CDF is non-decreasing, right-continuous, with limits and at ; it uniquely determines .
- Discrete random variables have a PMF summing to ; absolutely continuous random variables have a PDF integrating to with .
- Expectation is the Lebesgue integral , specialising to or .