Geometric Distribution

Essential
Last updated: Tags: Probability, Random Variables, Distributions

Prerequisites

How many times do you flip a coin before it comes up heads? That simple question — waiting for the first success in a sequence of independent trials — is exactly what the Geometric distribution answers.

Setup: repeated Bernoulli trials

Recall that a Bernoulli trial is a single experiment with two outcomes: success (probability pp) and failure (probability 1p1-p), where 0<p10 < p \leq 1. Now repeat such a trial independently, over and over, until the first success appears.

Let XX be the total number of trials performed (including the final successful one). Then XX takes values in {1,2,3,}\{1, 2, 3, \ldots\}: it equals 1 if the very first trial succeeds, 2 if the first fails and the second succeeds, and so on.

We write XGeom(p)X \sim \operatorname{Geom}(p).

PMF derivation

For X=kX = k to occur, the first k1k-1 trials must all be failures and the kk-th must be a success. Because the trials are independent:

P(X=k)=(1p)k1k1 failuresp1 success,k=1,2,3,P(X = k) = \underbrace{(1-p)^{k-1}}_{\text{$k-1$ failures}} \cdot \underbrace{p}_{\text{1 success}}, \qquad k = 1, 2, 3, \ldots

This is the probability mass function (PMF) of the Geometric distribution.

Verification: the PMF sums to 1

Sum over all possible values of kk:

k=1(1p)k1p=pj=0(1p)j.\sum_{k=1}^{\infty} (1-p)^{k-1} p = p \sum_{j=0}^{\infty} (1-p)^{j}.

The series j=0rj=11r\sum_{j=0}^{\infty} r^j = \frac{1}{1-r} for r<1|r| < 1. With r=1p[0,1)r = 1-p \in [0, 1):

p11(1p)=p1p=1.p \cdot \frac{1}{1-(1-p)} = p \cdot \frac{1}{p} = 1. \checkmark

Mean: E[X]=1/pE[X] = 1/p

Starting from the definition:

E[X]=k=1k(1p)k1p.E[X] = \sum_{k=1}^{\infty} k \, (1-p)^{k-1} p.

Factor out pp and recognise the sum as the derivative of a geometric series. For r<1|r| < 1:

k=1krk1=ddrk=1rk=ddrr1r=1(1r)2.\sum_{k=1}^{\infty} k \, r^{k-1} = \frac{d}{dr} \sum_{k=1}^{\infty} r^k = \frac{d}{dr} \frac{r}{1-r} = \frac{1}{(1-r)^2}.

Substituting r=1pr = 1-p:

E[X]=p1(1(1p))2=p1p2=1p.E[X] = p \cdot \frac{1}{(1-(1-p))^2} = p \cdot \frac{1}{p^2} = \frac{1}{p}.

Intuitively, if each trial succeeds with probability pp, you expect to need 1/p1/p trials on average.

Variance: Var(X)=(1p)/p2\operatorname{Var}(X) = (1-p)/p^2

Use the identity Var(X)=E[X2](E[X])2\operatorname{Var}(X) = E[X^2] - (E[X])^2. To find E[X2]E[X^2], compute E[X(X1)]E[X(X-1)] first (it involves one fewer power of kk to handle):

E[X(X1)]=k=2k(k1)(1p)k1p=p(1p)k=2k(k1)(1p)k2.E[X(X-1)] = \sum_{k=2}^{\infty} k(k-1)(1-p)^{k-1}p = p(1-p)\sum_{k=2}^{\infty} k(k-1)(1-p)^{k-2}.

The sum k=2k(k1)rk2=d2dr2r1r=2(1r)3\sum_{k=2}^{\infty} k(k-1)r^{k-2} = \frac{d^2}{dr^2}\frac{r}{1-r} = \frac{2}{(1-r)^3}, so with r=1pr = 1-p:

E[X(X1)]=p(1p)2p3=2(1p)p2.E[X(X-1)] = p(1-p) \cdot \frac{2}{p^3} = \frac{2(1-p)}{p^2}.

Therefore E[X2]=E[X(X1)]+E[X]=2(1p)p2+1pE[X^2] = E[X(X-1)] + E[X] = \frac{2(1-p)}{p^2} + \frac{1}{p}, and:

Var(X)=2(1p)p2+1p1p2=2(1p)+p1p2=1pp2.\operatorname{Var}(X) = \frac{2(1-p)}{p^2} + \frac{1}{p} - \frac{1}{p^2} = \frac{2(1-p) + p - 1}{p^2} = \frac{1-p}{p^2}.

Memorylessness

The most striking property of the Geometric distribution is that it has no memory of the past. Formally, for any positive integers mm and nn:

P(X>m+nX>m)=P(X>n).P(X > m + n \mid X > m) = P(X > n).

Proof. First compute the survival function. Since P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1}p:

P(X>m)=k=m+1(1p)k1p=(1p)m.P(X > m) = \sum_{k=m+1}^{\infty}(1-p)^{k-1}p = (1-p)^m.

Then by the definition of conditional probability:

P(X>m+nX>m)=P(X>m+n)P(X>m)=(1p)m+n(1p)m=(1p)n=P(X>n).P(X > m+n \mid X > m) = \frac{P(X > m+n)}{P(X > m)} = \frac{(1-p)^{m+n}}{(1-p)^m} = (1-p)^n = P(X > n). \qquad \square

In words: if you have already seen mm failures, the number of additional trials you need is distributed exactly the same as if you were starting fresh. The past failures give you no useful information about when the first success will arrive.

Uniqueness of memorylessness

The Geometric distribution is the only discrete distribution on {1,2,3,}\{1, 2, 3, \ldots\} that satisfies the memorylessness property. This is the discrete analogue of the fact that the Exponential distribution is the unique memoryless continuous distribution on the positive reals.

Summary

  • XGeom(p)X \sim \operatorname{Geom}(p) counts the number of independent Bernoulli(p)\operatorname{Bernoulli}(p) trials up to and including the first success.
  • PMF: P(X=k)=(1p)k1pP(X = k) = (1-p)^{k-1}p for k=1,2,3,k = 1, 2, 3, \ldots
  • Mean: E[X]=1/pE[X] = 1/p.
  • Variance: Var(X)=(1p)/p2\operatorname{Var}(X) = (1-p)/p^2.
  • Memorylessness: P(X>m+nX>m)=P(X>n)P(X > m+n \mid X > m) = P(X > n); the Geometric is the unique discrete distribution on N+\mathbb{N}^+ with this property.