Relationships Between Common Distributions

Essential
Last updated: Tags: Probability, Random Variables, Distributions

The seven distributions studied in this series — Bernoulli, Binomial, Geometric, Poisson, Exponential, Gamma, and Normal — were not invented independently. They are members of a single family, connected by structural containment and asymptotic limits. Understanding these connections turns a collection of formulas into a coherent picture.

Structural (exact) relationships

Structural relationships hold for every value of the parameters, not just in some limiting regime.

Bernoulli is Binomial(11, pp)

A Bernoulli(pp) trial is the simplest possible case of the Binomial: it is a Binomial with n=1n = 1. If XBernoulli(p)X \sim \operatorname{Bernoulli}(p), then XBin(1,p)X \sim \operatorname{Bin}(1, p) — both are defined by P(X=1)=pP(X = 1) = p, P(X=0)=1pP(X = 0) = 1-p.

Binomial as a sum of Bernoulli indicators

More generally, Bin(n,p)\operatorname{Bin}(n, p) is built directly from Bernoulli building blocks. If X1,X2,,XnX_1, X_2, \ldots, X_n are independent with each XiBernoulli(p)X_i \sim \operatorname{Bernoulli}(p), then

XX1+X2++XnBin(n,p).X \coloneqq X_1 + X_2 + \cdots + X_n \sim \operatorname{Bin}(n, p).

This is the very definition of the Binomial distribution, and it makes the mean npnp and variance np(1p)np(1-p) immediate via linearity and independence.

Geometric as repeated Bernoulli trials

The Geometric distribution arises when you repeat independent Bernoulli(p)\operatorname{Bernoulli}(p) trials and ask: how many trials until the first success? The geometry of the PMF P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1}p is a direct consequence of the independence of successive Bernoulli trials.

Exponential as the continuous analogue of Geometric

The Exponential distribution and the Geometric distribution are the only distributions on their respective domains ([0,)[0, \infty) and {1,2,3,}\{1, 2, 3, \ldots\}) with the memorylessness property:

P(X>s+tX>s)=P(X>t).P(X > s + t \mid X > s) = P(X > t).

The Geometric models waiting times in discrete time (number of trials); the Exponential models waiting times in continuous time (elapsed duration). They are structurally identical — the Exponential is the continuous-time limit of the Geometric as the trial duration shrinks to zero while p0p \to 0 proportionally.

Exponential is Gamma(11, λ\lambda)

The Gamma distribution with shape α>0\alpha > 0 and rate λ>0\lambda > 0 has density

f(x)=λαxα1eλxΓ(α),x>0.f(x) = \frac{\lambda^\alpha x^{\alpha-1} e^{-\lambda x}}{\Gamma(\alpha)}, \qquad x > 0.

Setting α=1\alpha = 1 and using Γ(1)=1\Gamma(1) = 1 gives f(x)=λeλxf(x) = \lambda e^{-\lambda x}, which is exactly Exp(λ)\operatorname{Exp}(\lambda). The Exponential is therefore the special case Gamma(1,λ)\operatorname{Gamma}(1, \lambda).

Gamma(α\alpha, λ\lambda) as a sum of Exponentials

For integer α\alpha, the relationship goes further. If X1,X2,,XαX_1, X_2, \ldots, X_\alpha are independent with each XiExp(λ)X_i \sim \operatorname{Exp}(\lambda), then

X1+X2++XαGamma(α,λ).X_1 + X_2 + \cdots + X_\alpha \sim \operatorname{Gamma}(\alpha, \lambda).

This can be verified by multiplying moment generating functions: the MGF of Exp(λ)\operatorname{Exp}(\lambda) is (λ/(λt))1(\lambda/(\lambda - t))^1, so the sum of α\alpha independent copies has MGF (λ/(λt))α(\lambda/(\lambda - t))^\alpha, which is the MGF of Gamma(α,λ)\operatorname{Gamma}(\alpha, \lambda).

Intuition. In a Poisson process with rate λ\lambda, the α\alpha-th event arrives after exactly α\alpha independent Exponential waiting times. The Gamma distribution captures the total waiting time until the α\alpha-th arrival.

Limiting relationships

Limiting relationships describe how one distribution approximates another as a parameter grows large.

Binomial(nn, λ/n\lambda/n) \to Poisson(λ\lambda) as nn \to \infty

The Poisson limit theorem (or law of rare events) states: if p=λ/np = \lambda/n and nn \to \infty with λ\lambda fixed, then for each k=0,1,2,k = 0, 1, 2, \ldots,

(nk)(λn)k(1λn)nk    λkeλk!=P(Poisson(λ)=k).\binom{n}{k} \left(\frac{\lambda}{n}\right)^k \left(1 - \frac{\lambda}{n}\right)^{n-k} \;\longrightarrow\; \frac{\lambda^k e^{-\lambda}}{k!} = P(\operatorname{Poisson}(\lambda) = k).

Sketch. The leading factor (nk)/nk1/k!\binom{n}{k}/n^k \to 1/k!, the term (λ/n)k(\lambda/n)^k contributes λk/nk\lambda^k/n^k, and (1λ/n)neλ(1 - \lambda/n)^n \to e^{-\lambda}. Combining gives the Poisson PMF.

Interpretation. When many independent trials each have a very small success probability, but the expected total number of successes np=λnp = \lambda stays fixed, the count of successes is approximately Poisson. This is the regime of rare but possible events.

Poisson is infinitely divisible

The Poisson distribution has a natural additive structure. If XPoisson(λ1)X \sim \operatorname{Poisson}(\lambda_1) and YPoisson(λ2)Y \sim \operatorname{Poisson}(\lambda_2) are independent, then

X+YPoisson(λ1+λ2).X + Y \sim \operatorname{Poisson}(\lambda_1 + \lambda_2).

This follows directly from the MGF: MX(t)=eλ1(et1)M_X(t) = e^{\lambda_1(e^t - 1)} and MY(t)=eλ2(et1)M_Y(t) = e^{\lambda_2(e^t - 1)}, so MX+Y(t)=e(λ1+λ2)(et1)M_{X+Y}(t) = e^{(\lambda_1 + \lambda_2)(e^t - 1)}.

Conversely, any Poisson(λ)\operatorname{Poisson}(\lambda) variable can be decomposed into the sum of independent Poisson(λ/n)\operatorname{Poisson}(\lambda/n) variables for any nn. This infinite divisibility mirrors the fact that a Poisson process can always be split into finer and finer independent sub-processes.

Central Limit Theorem: normalised Binomial \to Normal

By the Central Limit Theorem, the standardised Binomial converges to the standard Normal. If XBin(n,p)X \sim \operatorname{Bin}(n, p), then E[X]=npE[X] = np and Var(X)=np(1p)\operatorname{Var}(X) = np(1-p), so

Xnpnp(1p)  d  N(0,1)as n.\frac{X - np}{\sqrt{np(1-p)}} \;\xrightarrow{d}\; N(0,1) \quad \text{as } n \to \infty.

This is a direct application of the CLT: XX is the sum of nn i.i.d. Bernoulli(p)\operatorname{Bernoulli}(p) variables, each with mean pp and variance p(1p)p(1-p).

Gamma(α\alpha, λ\lambda) \to Normal as α\alpha \to \infty

Because Gamma(α,λ)\operatorname{Gamma}(\alpha, \lambda) is the sum of α\alpha independent Exp(λ)\operatorname{Exp}(\lambda) variables (each with mean 1/λ1/\lambda and variance 1/λ21/\lambda^2), the CLT applies directly. The standardised Gamma

Gamma(α,λ)α/λα/λ  d  N(0,1)as α.\frac{\operatorname{Gamma}(\alpha, \lambda) - \alpha/\lambda}{\sqrt{\alpha}/\lambda} \;\xrightarrow{d}\; N(0,1) \quad \text{as } \alpha \to \infty.

For large α\alpha, the Gamma distribution is well approximated by N(α/λ,α/λ2)N(\alpha/\lambda,\, \alpha/\lambda^2).

A map of the family

All seven distributions form a directed graph of relationships. Reading it as a graph with edges labelled “is a special case of”, “is a sum of”, or “converges to”:

  • Bernoulli \to Binomial (structural): Bin(n,p)\operatorname{Bin}(n, p) is the sum of nn i.i.d. Bernoulli(p)\operatorname{Bernoulli}(p) variables; Bernoulli(p)=Bin(1,p)\operatorname{Bernoulli}(p) = \operatorname{Bin}(1, p).
  • Binomial \to Poisson (limiting): Bin(n,λ/n)Poisson(λ)\operatorname{Bin}(n, \lambda/n) \to \operatorname{Poisson}(\lambda) as nn \to \infty.
  • Binomial \to Normal (limiting via CLT): the standardised Bin(n,p)\operatorname{Bin}(n, p) converges to N(0,1)N(0,1).
  • Bernoulli \to Geometric (structural): the Geometric counts repeated Bernoulli trials until the first success.
  • Geometric \to Exponential (continuous analogue / limit): the Exponential is the continuous-time version of the Geometric, sharing the memorylessness property.
  • Exponential \to Gamma (structural): Gamma(α,λ)\operatorname{Gamma}(\alpha, \lambda) is the sum of α\alpha i.i.d. Exp(λ)\operatorname{Exp}(\lambda) variables; Exp(λ)=Gamma(1,λ)\operatorname{Exp}(\lambda) = \operatorname{Gamma}(1, \lambda).
  • Gamma \to Normal (limiting via CLT): the standardised Gamma(α,λ)\operatorname{Gamma}(\alpha, \lambda) converges to N(0,1)N(0,1) as α\alpha \to \infty.

Two separate paths lead from Bernoulli to Normal: the direct path through Binomial and CLT, and the path through Geometric, Exponential, Gamma, and CLT. Both converge at the same fixed point — the Normal distribution, which is the universal attractor of standardised sums.

Summary

  • Bernoulli is Bin(1,p)\operatorname{Bin}(1, p); Bin(n,p)\operatorname{Bin}(n, p) is the sum of nn i.i.d. Bernoulli variables (structural).
  • Geometric models first-success time in repeated Bernoulli trials (structural); it is the discrete analogue of the Exponential via shared memorylessness.
  • Exponential is Gamma(1,λ)\operatorname{Gamma}(1, \lambda); Gamma(α,λ)\operatorname{Gamma}(\alpha, \lambda) is the sum of α\alpha i.i.d. Exp(λ)\operatorname{Exp}(\lambda) variables for integer α\alpha (structural).
  • Bin(n,λ/n)Poisson(λ)\operatorname{Bin}(n, \lambda/n) \to \operatorname{Poisson}(\lambda) as nn \to \infty (Poisson limit theorem).
  • Poisson is infinitely divisible: Poisson(λ1+λ2)\operatorname{Poisson}(\lambda_1 + \lambda_2) equals the sum of independent Poisson(λ1)\operatorname{Poisson}(\lambda_1) and Poisson(λ2)\operatorname{Poisson}(\lambda_2).
  • Standardised Binomial and Gamma both converge to N(0,1)N(0,1) by the CLT, since each is a sum of i.i.d. finite-variance variables.
  • The Normal distribution is the universal limiting distribution for standardised sums — the fixed point reached by two separate paths through the family.