The expectation (or expected value, or mean) of a random variable is its probability-weighted average. For a die roll you might compute 61(1+2+3+4+5+6)=3.5 — that is expectation for a uniform discrete variable. The measure-theoretic definition unifies this with the continuous case and gives a framework in which all the expected algebraic rules are provable theorems.
Definition
Let (Ω,F,P) be a probability space and X:Ω→R a random variable. The expectation of X is its Lebesgue integral against P:
E[X]:=∫ΩX(ω)dP(ω),
provided the integral exists (i.e.\ ∫Ω∣X∣dP<∞; then X is called integrable). By the change-of-variables formula for push-forward measures, this equals the integral against the distribution PX:
E[X]=∫RxdPX(x)=∫RxdFX(x),
where FX is the CDF of X and the last integral is a Lebesgue–Stieltjes integral.
Discrete case
If X is discrete with support {x1,x2,…} and P(X=xk)=pk, then
E[X]=k∑xkpk,
provided ∑k∣xk∣pk<∞.
Absolutely continuous case
If X has probability density function fX, then
E[X]=∫−∞+∞xfX(x)dx,
provided ∫−∞+∞∣x∣fX(x)dx<∞.
Both formulas are specialisations of the single Lebesgue integral ∫xdPX(x) — the discrete case integrates against a sum of point masses, the continuous case against a Lebesgue-absolutely-continuous measure.
Linearity of expectation
Theorem. For any integrable random variables X,Y and constants a,b∈R:
E[aX+bY]=aE[X]+bE[Y].(1)
Proof. Linearity of the Lebesgue integral: ∫(aX+bY)dP=a∫XdP+b∫YdP.
Linearity holds without any assumption of independence. This is one of the most powerful tools in probability: you can always decompose a complicated random variable into simpler parts and add expectations.
Example. For Sn=X1+X2+⋯+Xn where each Xi has the same mean μ:
E[Sn]=E[X1]+E[X2]+⋯+E[Xn]=nμ,
regardless of whether X1,…,Xn are independent, correlated, or even identically distributed.
Law of the unconscious statistician (LOTUS)
Computing E[g(X)] for a function g:R→R does not require knowing the distribution of g(X) explicitly.
Theorem (LOTUS). If X is a random variable with distribution PX and g:R→R is measurable, then
E[g(X)]=∫Rg(x)dPX(x).
In the discrete case this is ∑kg(xk)pk, and in the absolutely continuous case ∫−∞+∞g(x)fX(x)dx.
Proof sketch.g(X) is the composition of measurable maps ΩXRgR, so it is a random variable. Its expectation is ∫Ωg(X(ω))dP(ω). Applying the push-forward change-of-variables formula converts the integral over Ω to an integral over R against PX.
Example. For X∼Uniform(0,1) with density fX(x)=1:
E[X2]=∫01x2⋅1dx=31.
Expectation of non-negative random variables
For a non-negative random variable X≥0, the expectation always exists (possibly as +∞):
E[X]=∫0∞P(X>t)dt.(2)
This layer-cake formula converts a one-dimensional integral over R into an integral over [0,∞) of survival probabilities. It is especially useful when P(X>t) has a simple form.