When you add two independent random variables, what distribution does the sum have? The answer is the convolution of their distributions — a mathematical operation that combines two measures by sliding one over the other. Convolution explains why Binomials add, Poissons add, and Normals add, and it is the bridge between the algebra of distributions and the algebra of moment generating functions.
Let X and Y be independent random variables and let Z=X+Y.
Absolutely continuous case
If X and Y have densities fX and fY, then Z has density
fZ(z)=(fX∗fY)(z):=∫−∞+∞fX(x)fY(z−x)dx.(1)
Derivation. The joint density of (X,Y) factors as fX,Y(x,y)=fX(x)fY(y) by independence. The event {Z≤z} is {X+Y≤z}. Integrating over this region:
FZ(z)=P(X+Y≤z)=∫−∞+∞∫−∞z−xfX(x)fY(y)dydx=∫−∞+∞fX(x)FY(z−x)dx.
Differentiating with respect to z gives formula (1).
Discrete case
If X and Y take values in Z (or a common countable set) with PMFs pX and pY, then Z=X+Y has PMF
pZ(n)=(pX∗pY)(n):=k=−∞∑+∞pX(k)pY(n−k).(2)
This is the discrete analogue of (1): instead of integrating, you sum over all ways to split n into k (from X) and n−k (from Y).
Convolution is commutative and associative
The convolution operation on measures (or functions) satisfies:
fX∗fY=fY∗fX(commutative)
(fX∗fY)∗fZ=fX∗(fY∗fZ)(associative).
Commutativity is clear from the symmetry X+Y=Y+X. Associativity means the distribution of a sum of three (or more) independent variables can be computed by convolving any two first.
Key examples
Binomial sums
If X∼Bin(m,p) and Y∼Bin(n,p) are independent, then X+Y∼Bin(m+n,p).
Proof via convolution. The PMF of Bin(m,p) is pX(k)=(km)pk(1−p)m−k. Convolving:
pZ(r)=k=0∑r(km)pk(1−p)m−k⋅(r−kn)pr−k(1−p)n−(r−k)=pr(1−p)m+n−rk=0∑r(km)(r−kn).
The Vandermonde convolution identity ∑k=0r(km)(r−kn)=(rm+n) gives pZ(r)=(rm+n)pr(1−p)m+n−r, the Bin(m+n,p) PMF.
This also follows immediately from the interpretation: Bin(m,p) is the count of successes in m independent Bernoulli trials, and combining m and n independent trials gives m+n trials.
Poisson sums
If X∼Poisson(λ) and Y∼Poisson(μ) are independent, then X+Y∼Poisson(λ+μ).
Proof via convolution.
pZ(n)=k=0∑nk!e−λλk⋅(n−k)!e−μμn−k=n!e−(λ+μ)k=0∑n(kn)λkμn−k=n!e−(λ+μ)(λ+μ)n,
which is Poisson(λ+μ).
Normal sums
If X∼N(μ1,σ12) and Y∼N(μ2,σ22) are independent, then X+Y∼N(μ1+μ2,σ12+σ22).
The density-level convolution integral can be done by completing the square, but it is more elegant via MGFs (see below). The key point is that the normal family is closed under convolution — a sum of independent normals is normal, with means and variances adding separately.
Gamma sums
If X∼Gamma(α,λ) and Y∼Gamma(β,λ) are independent (same rate λ), then X+Y∼Gamma(α+β,λ).
This follows from the representation of Gamma(α,λ) as a sum of α independent Exp(λ) variables (for integer α), and extends to non-integer α by the MGF argument.
Connection to moment generating functions
The multiplicative property of MGFs is the algebraic reflection of convolution. For independent X and Y:
MX+Y(t)=MX(t)⋅MY(t).
This holds because: density-level convolution (fX∗fY)(z) corresponds exactly to the product f^X(t)⋅f^Y(t) of their Laplace transforms (with s=−t). The MGF is the (two-sided) Laplace transform of the distribution, so multiplication of MGFs corresponds to convolution of distributions.
This connection is why the MGF is the sharpest tool for computing distributions of sums. Instead of evaluating the convolution integral directly, you:
- Compute MX(t) and MY(t).
- Multiply: MZ(t)=MX(t)⋅MY(t).
- Identify: if MZ matches a known MGF, you know the distribution of Z.
The three examples above (Binomial, Poisson, Normal) all follow immediately from this approach, since each distribution’s MGF is known in closed form.
Convolution of more than two variables
For n independent variables X1,…,Xn, the distribution of Sn=X1+⋯+Xn is the n-fold convolution fX1∗fX2∗⋯∗fXn. Its MGF is ∏k=1nMXk(t). When all Xk are identically distributed with MGF M(t), the MGF of Sn is M(t)n.
The central limit theorem is the asymptotic statement about this n-fold convolution: after standardisation, M(t)n converges (under mild conditions) to et2/2, the MGF of the standard normal.
Summary
- For independent X,Y, the distribution of Z=X+Y is the convolution of their distributions: fZ=fX∗fY (density integral) or pZ(n)=∑kpX(k)pY(n−k) (PMF sum).
- Closure results: Bin(m,p)+Bin(n,p)=Bin(m+n,p); Poisson(λ)+Poisson(μ)=Poisson(λ+μ); N(μ1,σ12)+N(μ2,σ22)=N(μ1+μ2,σ12+σ22); Gamma(α,λ)+Gamma(β,λ)=Gamma(α+β,λ).
- Convolution corresponds to multiplication of MGFs: MX+Y=MX⋅MY, since MGFs are Laplace transforms of distributions.
- The MGF approach — compute, multiply, identify — is usually faster than evaluating the convolution integral directly.
- The n-fold convolution of i.i.d.\ variables converges (after standardisation) to the normal distribution by the central limit theorem.