Moment Generating Function

Essential
Last updated: Tags: Probability, Expectation

Prerequisites

A generating function packages infinitely many numbers into a single analytic object. The moment generating function (MGF) packages all the moments of a random variable into one power series. Because many useful distributions have simple MGFs, and because the MGF of a sum of independent variables is the product of their MGFs, the MGF is a powerful tool for computing distributions of sums and proving limit theorems.

Definition

The moment generating function of a random variable XX is

MX(t)E[etX],tR,M_X(t) \coloneqq E\bigl[e^{tX}\bigr], \quad t \in \mathbb{R},

defined for all tt in an open interval around 00 where the expectation is finite. For a discrete distribution:

MX(t)=ketxkpk.M_X(t) = \sum_k e^{t x_k} p_k.

For an absolutely continuous distribution:

MX(t)=+etxfX(x)dx.M_X(t) = \int_{-\infty}^{+\infty} e^{tx} f_X(x) \, dx.

Both are the Laplace transform of the distribution (with s=ts = -t).

Recovering moments from the MGF

Expand etXe^{tX} as a power series:

etX=k=0(tX)kk!=k=0Xkk!tk.e^{tX} = \sum_{k=0}^{\infty} \frac{(tX)^k}{k!} = \sum_{k=0}^{\infty} \frac{X^k}{k!} t^k.

Taking expectations (justified by dominated convergence when MX(t)M_X(t) is finite near 00):

MX(t)=k=0E[Xk]k!tk=k=0μkk!tk.(1)M_X(t) = \sum_{k=0}^{\infty} \frac{E[X^k]}{k!} t^k = \sum_{k=0}^{\infty} \frac{\mu'_k}{k!} t^k. \tag{1}

This shows that MX(t)M_X(t) is a power series in tt whose coefficients encode the raw moments. Differentiating kk times and evaluating at t=0t = 0:

MX(k)(0)=E[Xk]=μk.(2)M_X^{(k)}(0) = E[X^k] = \mu'_k. \tag{2}

So the kk-th moment is the kk-th derivative of the MGF at zero. This is the defining property: the MGF generates the moments.

MGFs of standard distributions

DistributionMX(t)M_X(t)Domain
Bernoulli(p)\operatorname{Bernoulli}(p)(1p)+pet(1-p) + pe^ttRt \in \mathbb{R}
Bin(n,p)\operatorname{Bin}(n, p)(1p+pet)n(1-p+pe^t)^ntRt \in \mathbb{R}
Poisson(λ)\operatorname{Poisson}(\lambda)exp(λ(et1))\exp(\lambda(e^t - 1))tRt \in \mathbb{R}
Exp(λ)\operatorname{Exp}(\lambda)λλt\dfrac{\lambda}{\lambda - t}t<λt < \lambda
Gamma(α,λ)\operatorname{Gamma}(\alpha, \lambda)(λλt)α\left(\dfrac{\lambda}{\lambda - t}\right)^\alphat<λt < \lambda
N(μ,σ2)\operatorname{N}(\mu, \sigma^2)exp ⁣(μt+σ2t22)\exp\!\left(\mu t + \tfrac{\sigma^2 t^2}{2}\right)tRt \in \mathbb{R}

Derivation for N(0,1)\operatorname{N}(0,1). With density f(x)=12πex2/2f(x) = \frac{1}{\sqrt{2\pi}} e^{-x^2/2}:

MX(t)=+etxex2/22πdx=+e(xt)2/2+t2/22πdx=et2/2+e(xt)2/22πdx=et2/2,M_X(t) = \int_{-\infty}^{+\infty} e^{tx} \cdot \frac{e^{-x^2/2}}{\sqrt{2\pi}} \, dx = \int_{-\infty}^{+\infty} \frac{e^{-(x-t)^2/2 + t^2/2}}{\sqrt{2\pi}} \, dx = e^{t^2/2} \int_{-\infty}^{+\infty} \frac{e^{-(x-t)^2/2}}{\sqrt{2\pi}} \, dx = e^{t^2/2},

by completing the square and recognising the remaining integral as a Gaussian integrating to 11.

The multiplicative property for independent variables

Theorem. If XX and YY are independent random variables with MGFs MXM_X and MYM_Y, both finite on an open interval containing 00, then the MGF of X+YX + Y is

MX+Y(t)=MX(t)MY(t).(3)M_{X+Y}(t) = M_X(t) \cdot M_Y(t). \tag{3}

Proof. By independence, etXe^{tX} and etYe^{tY} are also independent (they are measurable functions of XX and YY respectively), so:

MX+Y(t)=E[et(X+Y)]=E[etXetY]=E[etX]E[etY]=MX(t)MY(t).M_{X+Y}(t) = E\bigl[e^{t(X+Y)}\bigr] = E\bigl[e^{tX} e^{tY}\bigr] = E\bigl[e^{tX}\bigr] \cdot E\bigl[e^{tY}\bigr] = M_X(t) \cdot M_Y(t).

Applications. Property (3)(3) makes it easy to identify the distribution of a sum of independent variables by comparing MGFs:

  • XBin(m,p)X \sim \operatorname{Bin}(m,p), YBin(n,p)Y \sim \operatorname{Bin}(n,p) independent: MX+Y(t)=(1p+pet)m+nM_{X+Y}(t) = (1-p+pe^t)^{m+n}, so X+YBin(m+n,p)X + Y \sim \operatorname{Bin}(m+n, p).
  • XPoisson(λ)X \sim \operatorname{Poisson}(\lambda), YPoisson(μ)Y \sim \operatorname{Poisson}(\mu) independent: MX+Y(t)=e(λ+μ)(et1)M_{X+Y}(t) = e^{(\lambda+\mu)(e^t-1)}, so X+YPoisson(λ+μ)X + Y \sim \operatorname{Poisson}(\lambda + \mu).
  • XN(μ1,σ12)X \sim \operatorname{N}(\mu_1, \sigma_1^2), YN(μ2,σ22)Y \sim \operatorname{N}(\mu_2, \sigma_2^2) independent: MX+Y(t)=e(μ1+μ2)t+(σ12+σ22)t2/2M_{X+Y}(t) = e^{(\mu_1+\mu_2)t + (\sigma_1^2+\sigma_2^2)t^2/2}, so X+YN(μ1+μ2,σ12+σ22)X + Y \sim \operatorname{N}(\mu_1+\mu_2, \sigma_1^2+\sigma_2^2).

Uniqueness: the MGF determines the distribution

Theorem. If MX(t)M_X(t) is finite for all tt in some open interval (δ,δ)(-\delta, \delta) with δ>0\delta > 0, then MXM_X uniquely determines the distribution of XX.

More precisely: if MX(t)=MY(t)M_X(t) = M_Y(t) for all t(δ,δ)t \in (-\delta, \delta), then PX=PYP_X = P_Y (the distributions are identical).

This is why you can safely identify distributions by their MGFs — the equality MX+Y=MN(μ1+μ2,σ12+σ22)M_{X+Y} = M_{\operatorname{N}(\mu_1+\mu_2, \sigma_1^2+\sigma_2^2)} in the calculation above really does imply that X+YX + Y is normal.

The existence caveat. The MGF need not exist (may be ++\infty) for all t0t \neq 0. The Cauchy distribution has no MGF. The log-normal distribution has an MGF that is ++\infty for every t>0t > 0. When the MGF does not exist in a neighbourhood of 00, the moment sequence may not determine the distribution (the log-normal is the classic example). In such cases, the characteristic function φX(t)=E[eitX]\varphi_X(t) = E[e^{itX}] (with i=1i = \sqrt{-1}) always exists and always determines the distribution, making it the more general tool for theoretical work.

Cumulants

The cumulant generating function is the logarithm of the MGF:

KX(t)lnMX(t)=lnE[etX].K_X(t) \coloneqq \ln M_X(t) = \ln E[e^{tX}].

Its derivatives at 00 are the cumulants κkKX(k)(0)\kappa_k \coloneqq K_X^{(k)}(0). The first two cumulants are the mean and variance:

κ1=E[X]=μ,κ2=Var(X)=σ2.\kappa_1 = E[X] = \mu, \qquad \kappa_2 = \operatorname{Var}(X) = \sigma^2.

For independent X,YX, Y: KX+Y(t)=KX(t)+KY(t)K_{X+Y}(t) = K_X(t) + K_Y(t), so cumulants add under independence, just like variances. This additivity makes cumulants particularly convenient in many calculations.

Summary

  • The MGF MX(t)=E[etX]M_X(t) = E[e^{tX}] encodes all moments as derivatives at 00: MX(k)(0)=E[Xk]M_X^{(k)}(0) = E[X^k].
  • Standard MGFs: Binomial (1p+pet)n(1-p+pe^t)^n; Poisson eλ(et1)e^{\lambda(e^t-1)}; Exponential λ/(λt)\lambda/(\lambda-t); Normal eμt+σ2t2/2e^{\mu t + \sigma^2 t^2/2}.
  • Multiplicative property: MX+Y=MXMYM_{X+Y} = M_X \cdot M_Y for independent X,YX, Y — this identifies distributions of sums.
  • Uniqueness: when MXM_X is finite near 00, it uniquely determines the distribution of XX.
  • The MGF may not exist (e.g.\ Cauchy, log-normal); in such cases the characteristic function E[eitX]E[e^{itX}] always exists and always determines the distribution.
  • The cumulant generating function lnMX(t)\ln M_X(t) has derivatives at 00 equal to the cumulants, which add under independence.