Moments

Essential
Last updated: Tags: Probability, Expectation

Prerequisites

The mean and variance give you the centre and spread of a distribution. But two very different distributions can share the same mean and variance — a symmetric bell curve and a sharply skewed one, for instance. Moments are a systematic way to extract more detailed shape information, one order at a time.

Raw moments

The kk-th raw moment (or kk-th moment about the origin) of a random variable XX is

μkE[Xk],k=0,1,2,,\mu'_k \coloneqq E[X^k], \quad k = 0, 1, 2, \ldots,

provided the expectation is finite. The zeroth moment is always μ0=E[1]=1\mu'_0 = E[1] = 1. The first moment is μ1=E[X]=μ\mu'_1 = E[X] = \mu, the mean.

Central moments

The kk-th central moment is the kk-th moment about the mean:

μkE[(Xμ)k],k=0,1,2,\mu_k \coloneqq E\bigl[(X - \mu)^k\bigr], \quad k = 0, 1, 2, \ldots

The first two central moments are:

  • μ0=1\mu_0 = 1.
  • μ1=E[Xμ]=0\mu_1 = E[X - \mu] = 0 (the mean of the centred variable is zero).
  • μ2=E[(Xμ)2]=Var(X)\mu_2 = E[(X - \mu)^2] = \operatorname{Var}(X), the variance.

Central moments are translation-invariant: replacing XX by X+cX + c leaves all μk\mu_k (k2k \geq 2) unchanged. This makes them the natural measures of shape.

Converting between raw and central moments

The binomial theorem gives the relationship. Expanding (Xμ)k(X - \mu)^k:

μk=j=0k(kj)μj(μ)kj.\mu_k = \sum_{j=0}^{k} \binom{k}{j} \mu'_j \, (-\mu)^{k-j}.

The first few conversions:

μ2=μ2(μ1)2,\mu_2 = \mu'_2 - (\mu'_1)^2, μ3=μ33μ2μ1+2(μ1)3,\mu_3 = \mu'_3 - 3\mu'_2 \mu'_1 + 2(\mu'_1)^3, μ4=μ44μ3μ1+6μ2(μ1)23(μ1)4.\mu_4 = \mu'_4 - 4\mu'_3 \mu'_1 + 6\mu'_2 (\mu'_1)^2 - 3(\mu'_1)^4.

These formulas are useful when computing moments from the raw expectation E[Xk]E[X^k] is easier than from E[(Xμ)k]E[(X - \mu)^k].

Standardised moments: skewness and kurtosis

To make central moments dimensionless and scale-invariant, divide by an appropriate power of the standard deviation σ=μ2\sigma = \sqrt{\mu_2}.

Skewness

The skewness is the standardised third central moment:

γ1μ3σ3=E[(Xμ)3](E[(Xμ)2])3/2.\gamma_1 \coloneqq \frac{\mu_3}{\sigma^3} = \frac{E[(X-\mu)^3]}{(E[(X-\mu)^2])^{3/2}}.
  • γ1=0\gamma_1 = 0 for symmetric distributions (the third central moment vanishes by symmetry).
  • γ1>0\gamma_1 > 0 indicates a right-skewed (positively skewed) distribution: the right tail is longer — there are occasional very large values pulling the mean above the median.
  • γ1<0\gamma_1 < 0 indicates a left-skewed distribution.

Example. The exponential distribution Exp(λ)\operatorname{Exp}(\lambda) has mean 1/λ1/\lambda, variance 1/λ21/\lambda^2, and E[(X1/λ)3]=2/λ3E[(X-1/\lambda)^3] = 2/\lambda^3, so γ1=2>0\gamma_1 = 2 > 0 — it is right-skewed, which matches the long right tail visible in its density.

Kurtosis and excess kurtosis

The kurtosis is the standardised fourth central moment:

γ2μ4σ4=E[(Xμ)4](E[(Xμ)2])2.\gamma_2 \coloneqq \frac{\mu_4}{\sigma^4} = \frac{E[(X-\mu)^4]}{(E[(X-\mu)^2])^2}.

For the standard normal distribution, γ2=3\gamma_2 = 3. The excess kurtosis (also called kurtosis in many statistics packages) is

κγ23.\kappa \coloneqq \gamma_2 - 3.
  • κ=0\kappa = 0 (mesokurtic): tails behave like a normal distribution. The normal is the reference.
  • κ>0\kappa > 0 (leptokurtic): heavier tails than normal — extreme values are more probable. The tt-distribution and Cauchy distribution are leptokurtic.
  • κ<0\kappa < 0 (platykurtic): lighter tails — extreme values are less likely than in a normal. The uniform distribution has κ=6/5\kappa = -6/5.

Kurtosis measures tail heaviness, not “peakedness” as is sometimes stated — the two properties are not equivalent.

Do moments determine the distribution?

A natural question is whether the sequence of moments (μ1,μ2,μ3,)(\mu'_1, \mu'_2, \mu'_3, \ldots) uniquely determines the distribution.

When yes: the moment problem. If all moments exist and the Carleman condition holds,

k=1(μ2k)1/(2k)=+,\sum_{k=1}^{\infty} (\mu'_{2k})^{-1/(2k)} = +\infty,

then the moments uniquely determine the distribution. The normal, Poisson, binomial, and exponential distributions all satisfy this condition.

When no. The log-normal distribution is the canonical counterexample: there exist infinitely many distinct distributions with the same moment sequence as a given log-normal. The Carleman condition fails for the log-normal because its moments grow too fast (μkek2/2\mu'_k \sim e^{k^2/2}).

In practice this means: when fitting a model via moments (method of moments), you should check that the moment problem has a unique solution for your distribution class.

Existence of moments

Not all distributions have all moments. The Cauchy distribution has undefined mean and undefined variance — its tails decay as x2|x|^{-2}, which is too slow for xf(x)dx\int |x| \, f(x) \, dx to converge. In general, the kk-th moment exists when the tails decay at least as fast as x(k+1+ε)|x|^{-(k+1+\varepsilon)} for some ε>0\varepsilon > 0.

A useful hierarchy: if the kk-th moment is finite, all moments of order j<kj < k are also finite, by Jensen’s inequality applied to the concave function ttj/kt \mapsto t^{j/k} on [0,)[0, \infty).

Summary

  • The kk-th raw moment is μk=E[Xk]\mu'_k = E[X^k]; the kk-th central moment is μk=E[(XE[X])k]\mu_k = E[(X - E[X])^k].
  • Mean = μ1\mu'_1; Variance = μ2=μ2(μ1)2\mu_2 = \mu'_2 - (\mu'_1)^2.
  • Skewness γ1=μ3/σ3\gamma_1 = \mu_3 / \sigma^3 measures asymmetry; γ1>0\gamma_1 > 0 is right-skewed.
  • Excess kurtosis κ=μ4/σ43\kappa = \mu_4/\sigma^4 - 3 measures tail heaviness relative to the normal; κ>0\kappa > 0 means heavier tails.
  • Moments uniquely determine the distribution when the Carleman condition holds; the log-normal shows this can fail when moments grow very fast.
  • The Cauchy distribution has no finite moments — tail decay must be fast enough for E[Xk]E[|X|^k] to converge.