Binomial Distribution

Essential
Last updated: Tags: Probability, Random Variables, Distributions

Prerequisites

When an experiment consists of repeating the same binary trial many times — flipping a coin, testing manufactured parts, sampling survey responses — the natural question is: how many successes occur? The Binomial distribution answers exactly this question.

Setup

Fix an integer n1n \ge 1 and a probability p[0,1]p \in [0, 1]. Perform nn independent Bernoulli(p) trials and let XX count the total number of successes. Formally, let X1,X2,,XnX_1, X_2, \ldots, X_n be independent with each XiBernoulli(p)X_i \sim \text{Bernoulli}(p); then

XX1+X2++Xn.X \coloneqq X_1 + X_2 + \cdots + X_n.

We write XBinomial(n,p)X \sim \text{Binomial}(n, p), or XBin(n,p)X \sim \text{Bin}(n, p).

PMF derivation

XX takes values in {0,1,,n}\{0, 1, \ldots, n\}. To find P(X=k)P(X = k), count all ways the nn trials can yield exactly kk successes.

Combinatorial argument. Any particular sequence of kk successes and nkn - k failures occurs with probability pk(1p)nkp^k (1-p)^{n-k} (by independence). The number of such sequences — choosing which kk of the nn positions are successes — is (nk)\binom{n}{k}. Summing over all sequences gives the probability mass function (PMF):

P(X=k)=(nk)pk(1p)nk,k=0,1,,n.P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}, \qquad k = 0, 1, \ldots, n.

This is a valid PMF because k=0n(nk)pk(1p)nk=(p+(1p))n=1\sum_{k=0}^{n} \binom{n}{k} p^k (1-p)^{n-k} = (p + (1-p))^n = 1 by the Binomial theorem.

Mean

Linearity of expectation lets us avoid any direct computation from the PMF. Using the representation X=i=1nXiX = \sum_{i=1}^n X_i and the fact that E[Xi]=pE[X_i] = p for each Bernoulli indicator:

E[X]=i=1nE[Xi]=np.E[X] = \sum_{i=1}^n E[X_i] = np.

No independence is required — linearity holds unconditionally.

Variance

Independence of the XiX_i is needed here. Because the indicators are independent, their variances add:

Var(X)=i=1nVar(Xi)=np(1p)=np(1p).\text{Var}(X) = \sum_{i=1}^n \text{Var}(X_i) = n \cdot p(1-p) = np(1-p).

Additive property

Theorem. If XBin(m,p)X \sim \text{Bin}(m, p) and YBin(n,p)Y \sim \text{Bin}(n, p) are independent, then

X+YBin(m+n,p).X + Y \sim \text{Bin}(m + n,\, p).

Proof via MGFs. The MGF of XBin(m,p)X \sim \text{Bin}(m, p) is obtained by multiplying mm independent Bernoulli MGFs:

MX(t)=((1p)+pet)m.M_X(t) = \bigl((1-p) + pe^t\bigr)^m.

Similarly MY(t)=((1p)+pet)nM_Y(t) = ((1-p) + pe^t)^n. Because XX and YY are independent, the MGF of their sum factors:

MX+Y(t)=MX(t)MY(t)=((1p)+pet)m+n,M_{X+Y}(t) = M_X(t) \cdot M_Y(t) = \bigl((1-p) + pe^t\bigr)^{m+n},

which is the MGF of Bin(m+n,p)\text{Bin}(m+n, p). Since the MGF uniquely determines the distribution, the result follows. \square

Intuition. Running mm independent Bernoulli trials followed by nn more independent Bernoulli trials — all with the same pp — is indistinguishable from running m+nm + n trials in one go.

Summary

  • XBin(n,p)X \sim \text{Bin}(n, p) counts successes in nn independent Bernoulli(p) trials.
  • PMF: P(X=k)=(nk)pk(1p)nkP(X = k) = \binom{n}{k} p^k (1-p)^{n-k} for k=0,1,,nk = 0, 1, \ldots, n.
  • Mean: E[X]=npE[X] = np (by linearity of expectation).
  • Variance: Var(X)=np(1p)\text{Var}(X) = np(1-p) (by independence of indicators).
  • MGF: M(t)=((1p)+pet)nM(t) = ((1-p) + pe^t)^n.
  • Additive: the sum of independent Bin(m,p)\text{Bin}(m, p) and Bin(n,p)\text{Bin}(n, p) is Bin(m+n,p)\text{Bin}(m+n, p).