Expectation tells you where the distribution is centred. Variance tells you how spread out it is around that centre — how much a typical observation deviates from the mean. It is the simplest second-order property of a distribution, and it underlies everything from the standard error of an estimator to the volatility of a financial instrument.
Definition
Let X be a random variable with finite expectation μ:=E[X]. The variance of X is
Var(X):=E[(X−μ)2].
It is the expected squared deviation of X from its mean. Since (X−μ)2≥0, the variance is always non-negative: Var(X)≥0.
The standard deviation is
σX:=Var(X),
which has the same physical units as X itself. Variance is the more algebraically convenient quantity, but standard deviation is what you report in practice.
The computational formula
Expanding the square and applying linearity of expectation gives a formula that avoids computing μ first:
Variance bounds the probability of large deviations from the mean. Chebyshev’s inequality states: for any k>0,
P(∣X−μ∣≥kσ)≤k21.(3)
More generally, for any ε>0:
P(∣X−μ∣≥ε)≤ε2Var(X).
Proof. By Markov’s inequality applied to the non-negative random variable (X−μ)2:
P((X−μ)2≥ε2)≤ε2E[(X−μ)2]=ε2Var(X).
Chebyshev’s inequality is weak (it holds for any distribution) but universally applicable. It is the key tool in proving the weak law of large numbers: if X1,X2,… are i.i.d.\ with mean μ and finite variance, then Xn=n1∑k=1nXk converges in probability to μ.
Quick proof.E[Xn]=μ and Var(Xn)=nVar(X1)→0 by independence and the scaling rule. Chebyshev gives P(∣Xn−μ∣≥ε)≤nε2Var(X1)→0.
Why variance rather than mean absolute deviation?
Variance squares the deviation. An alternative spread measure is the mean absolute deviationE[∣X−μ∣]. Both capture spread, but variance has three practical advantages:
Algebra. The variance of a sum of independent variables is the sum of variances (as above). The analogous result for mean absolute deviation fails.
Smoothness. The function x↦x2 is everywhere differentiable; x↦∣x∣ is not differentiable at 0. Variance appears naturally in calculus-based derivations (ordinary least squares, Fisher information, etc.).
Completeness. Variance extends to the covariance matrix for multivariate distributions, which mean absolute deviation cannot.
The cost is interpretability: σ2 has units of (units of X)2, which is why standard deviation σ is always reported alongside variance.
Summary
Var(X):=E[(X−E[X])2]≥0 measures average squared spread around the mean.
Computational formula: Var(X)=E[X2]−(E[X])2.
Scaling rule: Var(aX+b)=a2Var(X); shifts do not affect variance.
Independence additivity: Var(X+Y)=Var(X)+Var(Y) when X,Y are independent.
Chebyshev’s inequality: P(∣X−μ∣≥ε)≤Var(X)/ε2 — a universal (though weak) tail bound.
Variance is algebraically natural (additive under independence, smooth); standard deviation σ=Var(X) is what you report because it matches the units of X.