Moments
EssentialPrerequisites
The mean and variance give you the centre and spread of a distribution. But two very different distributions can share the same mean and variance — a symmetric bell curve and a sharply skewed one, for instance. Moments are a systematic way to extract more detailed shape information, one order at a time.
Raw moments
The -th raw moment (or -th moment about the origin) of a random variable is
provided the expectation is finite. The zeroth moment is always . The first moment is , the mean.
Central moments
The -th central moment is the -th moment about the mean:
The first two central moments are:
- .
- (the mean of the centred variable is zero).
- , the variance.
Central moments are translation-invariant: replacing by leaves all () unchanged. This makes them the natural measures of shape.
Converting between raw and central moments
The binomial theorem gives the relationship. Expanding :
The first few conversions:
These formulas are useful when computing moments from the raw expectation is easier than from .
Standardised moments: skewness and kurtosis
To make central moments dimensionless and scale-invariant, divide by an appropriate power of the standard deviation .
Skewness
The skewness is the standardised third central moment:
- for symmetric distributions (the third central moment vanishes by symmetry).
- indicates a right-skewed (positively skewed) distribution: the right tail is longer — there are occasional very large values pulling the mean above the median.
- indicates a left-skewed distribution.
Example. The exponential distribution has mean , variance , and , so — it is right-skewed, which matches the long right tail visible in its density.
Kurtosis and excess kurtosis
The kurtosis is the standardised fourth central moment:
For the standard normal distribution, . The excess kurtosis (also called kurtosis in many statistics packages) is
- (mesokurtic): tails behave like a normal distribution. The normal is the reference.
- (leptokurtic): heavier tails than normal — extreme values are more probable. The -distribution and Cauchy distribution are leptokurtic.
- (platykurtic): lighter tails — extreme values are less likely than in a normal. The uniform distribution has .
Kurtosis measures tail heaviness, not “peakedness” as is sometimes stated — the two properties are not equivalent.
Do moments determine the distribution?
A natural question is whether the sequence of moments uniquely determines the distribution.
When yes: the moment problem. If all moments exist and the Carleman condition holds,
then the moments uniquely determine the distribution. The normal, Poisson, binomial, and exponential distributions all satisfy this condition.
When no. The log-normal distribution is the canonical counterexample: there exist infinitely many distinct distributions with the same moment sequence as a given log-normal. The Carleman condition fails for the log-normal because its moments grow too fast ().
In practice this means: when fitting a model via moments (method of moments), you should check that the moment problem has a unique solution for your distribution class.
Existence of moments
Not all distributions have all moments. The Cauchy distribution has undefined mean and undefined variance — its tails decay as , which is too slow for to converge. In general, the -th moment exists when the tails decay at least as fast as for some .
A useful hierarchy: if the -th moment is finite, all moments of order are also finite, by Jensen’s inequality applied to the concave function on .
Summary
- The -th raw moment is ; the -th central moment is .
- Mean = ; Variance = .
- Skewness measures asymmetry; is right-skewed.
- Excess kurtosis measures tail heaviness relative to the normal; means heavier tails.
- Moments uniquely determine the distribution when the Carleman condition holds; the log-normal shows this can fail when moments grow very fast.
- The Cauchy distribution has no finite moments — tail decay must be fast enough for to converge.