Convolution of Distributions

Essential
Last updated: Tags: Probability, Random Variables, Distributions

When you add two independent random variables, what distribution does the sum have? The answer is the convolution of their distributions — a mathematical operation that combines two measures by sliding one over the other. Convolution explains why Binomials add, Poissons add, and Normals add, and it is the bridge between the algebra of distributions and the algebra of moment generating functions.

The convolution formula

Let XX and YY be independent random variables and let Z=X+YZ = X + Y.

Absolutely continuous case

If XX and YY have densities fXf_X and fYf_Y, then ZZ has density

fZ(z)=(fXfY)(z)+fX(x)fY(zx)dx.(1)f_Z(z) = (f_X * f_Y)(z) \coloneqq \int_{-\infty}^{+\infty} f_X(x) \, f_Y(z - x) \, dx. \tag{1}

Derivation. The joint density of (X,Y)(X, Y) factors as fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x, y) = f_X(x) f_Y(y) by independence. The event {Zz}\{Z \leq z\} is {X+Yz}\{X + Y \leq z\}. Integrating over this region:

FZ(z)=P(X+Yz)=+zxfX(x)fY(y)dydx=+fX(x)FY(zx)dx.F_Z(z) = P(X + Y \leq z) = \int_{-\infty}^{+\infty} \int_{-\infty}^{z-x} f_X(x) f_Y(y) \, dy \, dx = \int_{-\infty}^{+\infty} f_X(x) F_Y(z - x) \, dx.

Differentiating with respect to zz gives formula (1)(1).

Discrete case

If XX and YY take values in Z\mathbb{Z} (or a common countable set) with PMFs pXp_X and pYp_Y, then Z=X+YZ = X + Y has PMF

pZ(n)=(pXpY)(n)k=+pX(k)pY(nk).(2)p_Z(n) = (p_X * p_Y)(n) \coloneqq \sum_{k=-\infty}^{+\infty} p_X(k) \, p_Y(n - k). \tag{2}

This is the discrete analogue of (1)(1): instead of integrating, you sum over all ways to split nn into kk (from XX) and nkn - k (from YY).

Convolution is commutative and associative

The convolution operation on measures (or functions) satisfies:

fXfY=fYfX(commutative)f_X * f_Y = f_Y * f_X \qquad \text{(commutative)} (fXfY)fZ=fX(fYfZ)(associative).(f_X * f_Y) * f_Z = f_X * (f_Y * f_Z) \qquad \text{(associative)}.

Commutativity is clear from the symmetry X+Y=Y+XX + Y = Y + X. Associativity means the distribution of a sum of three (or more) independent variables can be computed by convolving any two first.

Key examples

Binomial sums

If XBin(m,p)X \sim \operatorname{Bin}(m, p) and YBin(n,p)Y \sim \operatorname{Bin}(n, p) are independent, then X+YBin(m+n,p)X + Y \sim \operatorname{Bin}(m+n, p).

Proof via convolution. The PMF of Bin(m,p)\operatorname{Bin}(m, p) is pX(k)=(mk)pk(1p)mkp_X(k) = \binom{m}{k} p^k (1-p)^{m-k}. Convolving:

pZ(r)=k=0r(mk)pk(1p)mk(nrk)prk(1p)n(rk)=pr(1p)m+nrk=0r(mk)(nrk).p_Z(r) = \sum_{k=0}^{r} \binom{m}{k} p^k (1-p)^{m-k} \cdot \binom{n}{r-k} p^{r-k} (1-p)^{n-(r-k)} = p^r (1-p)^{m+n-r} \sum_{k=0}^r \binom{m}{k}\binom{n}{r-k}.

The Vandermonde convolution identity k=0r(mk)(nrk)=(m+nr)\sum_{k=0}^r \binom{m}{k}\binom{n}{r-k} = \binom{m+n}{r} gives pZ(r)=(m+nr)pr(1p)m+nrp_Z(r) = \binom{m+n}{r} p^r (1-p)^{m+n-r}, the Bin(m+n,p)\operatorname{Bin}(m+n,p) PMF.

This also follows immediately from the interpretation: Bin(m,p)\operatorname{Bin}(m, p) is the count of successes in mm independent Bernoulli trials, and combining mm and nn independent trials gives m+nm+n trials.

Poisson sums

If XPoisson(λ)X \sim \operatorname{Poisson}(\lambda) and YPoisson(μ)Y \sim \operatorname{Poisson}(\mu) are independent, then X+YPoisson(λ+μ)X + Y \sim \operatorname{Poisson}(\lambda + \mu).

Proof via convolution.

pZ(n)=k=0neλλkk!eμμnk(nk)!=e(λ+μ)n!k=0n(nk)λkμnk=e(λ+μ)(λ+μ)nn!,p_Z(n) = \sum_{k=0}^{n} \frac{e^{-\lambda}\lambda^k}{k!} \cdot \frac{e^{-\mu}\mu^{n-k}}{(n-k)!} = \frac{e^{-(\lambda+\mu)}}{n!} \sum_{k=0}^{n} \binom{n}{k} \lambda^k \mu^{n-k} = \frac{e^{-(\lambda+\mu)}(\lambda+\mu)^n}{n!},

which is Poisson(λ+μ)\operatorname{Poisson}(\lambda + \mu).

Normal sums

If XN(μ1,σ12)X \sim \operatorname{N}(\mu_1, \sigma_1^2) and YN(μ2,σ22)Y \sim \operatorname{N}(\mu_2, \sigma_2^2) are independent, then X+YN(μ1+μ2,σ12+σ22)X + Y \sim \operatorname{N}(\mu_1 + \mu_2, \sigma_1^2 + \sigma_2^2).

The density-level convolution integral can be done by completing the square, but it is more elegant via MGFs (see below). The key point is that the normal family is closed under convolution — a sum of independent normals is normal, with means and variances adding separately.

Gamma sums

If XGamma(α,λ)X \sim \operatorname{Gamma}(\alpha, \lambda) and YGamma(β,λ)Y \sim \operatorname{Gamma}(\beta, \lambda) are independent (same rate λ\lambda), then X+YGamma(α+β,λ)X + Y \sim \operatorname{Gamma}(\alpha + \beta, \lambda).

This follows from the representation of Gamma(α,λ)\operatorname{Gamma}(\alpha, \lambda) as a sum of α\alpha independent Exp(λ)\operatorname{Exp}(\lambda) variables (for integer α\alpha), and extends to non-integer α\alpha by the MGF argument.

Connection to moment generating functions

The multiplicative property of MGFs is the algebraic reflection of convolution. For independent XX and YY:

MX+Y(t)=MX(t)MY(t).M_{X+Y}(t) = M_X(t) \cdot M_Y(t).

This holds because: density-level convolution (fXfY)(z)(f_X * f_Y)(z) corresponds exactly to the product f^X(t)f^Y(t)\hat{f}_X(t) \cdot \hat{f}_Y(t) of their Laplace transforms (with s=ts = -t). The MGF is the (two-sided) Laplace transform of the distribution, so multiplication of MGFs corresponds to convolution of distributions.

This connection is why the MGF is the sharpest tool for computing distributions of sums. Instead of evaluating the convolution integral directly, you:

  1. Compute MX(t)M_X(t) and MY(t)M_Y(t).
  2. Multiply: MZ(t)=MX(t)MY(t)M_Z(t) = M_X(t) \cdot M_Y(t).
  3. Identify: if MZM_Z matches a known MGF, you know the distribution of ZZ.

The three examples above (Binomial, Poisson, Normal) all follow immediately from this approach, since each distribution’s MGF is known in closed form.

Convolution of more than two variables

For nn independent variables X1,,XnX_1, \ldots, X_n, the distribution of Sn=X1++XnS_n = X_1 + \cdots + X_n is the nn-fold convolution fX1fX2fXnf_{X_1} * f_{X_2} * \cdots * f_{X_n}. Its MGF is k=1nMXk(t)\prod_{k=1}^n M_{X_k}(t). When all XkX_k are identically distributed with MGF M(t)M(t), the MGF of SnS_n is M(t)nM(t)^n.

The central limit theorem is the asymptotic statement about this nn-fold convolution: after standardisation, M(t)nM(t)^n converges (under mild conditions) to et2/2e^{t^2/2}, the MGF of the standard normal.

Summary

  • For independent X,YX, Y, the distribution of Z=X+YZ = X + Y is the convolution of their distributions: fZ=fXfYf_Z = f_X * f_Y (density integral) or pZ(n)=kpX(k)pY(nk)p_Z(n) = \sum_k p_X(k) p_Y(n-k) (PMF sum).
  • Closure results: Bin(m,p)+Bin(n,p)=Bin(m+n,p)\operatorname{Bin}(m,p) + \operatorname{Bin}(n,p) = \operatorname{Bin}(m+n,p); Poisson(λ)+Poisson(μ)=Poisson(λ+μ)\operatorname{Poisson}(\lambda) + \operatorname{Poisson}(\mu) = \operatorname{Poisson}(\lambda+\mu); N(μ1,σ12)+N(μ2,σ22)=N(μ1+μ2,σ12+σ22)\operatorname{N}(\mu_1,\sigma_1^2) + \operatorname{N}(\mu_2,\sigma_2^2) = \operatorname{N}(\mu_1+\mu_2,\sigma_1^2+\sigma_2^2); Gamma(α,λ)+Gamma(β,λ)=Gamma(α+β,λ)\operatorname{Gamma}(\alpha,\lambda) + \operatorname{Gamma}(\beta,\lambda) = \operatorname{Gamma}(\alpha+\beta,\lambda).
  • Convolution corresponds to multiplication of MGFs: MX+Y=MXMYM_{X+Y} = M_X \cdot M_Y, since MGFs are Laplace transforms of distributions.
  • The MGF approach — compute, multiply, identify — is usually faster than evaluating the convolution integral directly.
  • The nn-fold convolution of i.i.d.\ variables converges (after standardisation) to the normal distribution by the central limit theorem.