Independence of Random Variables

Essential
Last updated: Tags: Probability, Random Variables, Independence

Independence is the cleanest possible relationship between two random variables: knowing the value of one tells you nothing about the other. The formal definition translates this intuition into a factorisation condition on the joint distribution, and from it the entire theory of independent sums, product expectations, and variance additivity follows.

Definition

Two random variables XX and YY are independent if for every pair of Borel sets B1,B2B(R)B_1, B_2 \in \mathcal{B}(\mathbb{R}):

P(XB1,YB2)=P(XB1)P(XB2).P(X \in B_1,\, Y \in B_2) = P(X \in B_1) \cdot P(X \in B_2).

Equivalently, the joint distribution P(X,Y)P_{(X,Y)} is the product measure PXPYP_X \otimes P_Y.

Because B(R2)\mathcal{B}(\mathbb{R}^2) is generated by rectangles B1×B2B_1 \times B_2, the product-measure condition on rectangles extends to the entire joint law.

In terms of the CDF

An equivalent characterisation: XX and YY are independent if and only if

FX,Y(x,y)=FX(x)FY(y)for all (x,y)R2.(1)F_{X,Y}(x, y) = F_X(x) \cdot F_Y(y) \quad \text{for all } (x, y) \in \mathbb{R}^2. \tag{1}

This is often the most convenient form to check.

Discrete case

For jointly discrete (X,Y)(X, Y), independence is equivalent to the joint PMF factorising:

pX,Y(x,y)=pX(x)pY(y)for all (x,y).p_{X,Y}(x, y) = p_X(x) \cdot p_Y(y) \quad \text{for all } (x, y).

Absolutely continuous case

For jointly absolutely continuous (X,Y)(X, Y), independence is equivalent to the joint PDF factorising:

fX,Y(x,y)=fX(x)fY(y)for almost every (x,y).(2)f_{X,Y}(x, y) = f_X(x) \cdot f_Y(y) \quad \text{for almost every } (x, y). \tag{2}

Example. The uniform distribution on the unit square has fX,Y(x,y)=1f_{X,Y}(x,y) = 1 on [0,1]2[0,1]^2, and the marginals are both fX(x)=1f_X(x) = 1, fY(y)=1f_Y(y) = 1 on [0,1][0,1]. Since 1=111 = 1 \cdot 1, the PDF factorises and X,YX, Y are independent.

Contrast this with the uniform distribution on the unit disk {x2+y21}\{x^2 + y^2 \leq 1\}: the density is 1/π1/\pi inside the disk and 00 outside. The marginal of XX has density fX(x)=2π1x2f_X(x) = \frac{2}{\pi}\sqrt{1-x^2} for x[1,1]x \in [-1,1], which is not a constant — so the joint density does not factorise, and XX and YY are not independent.

Independence implies factorisation of expectations

Theorem. If XX and YY are independent and g,h:RRg, h : \mathbb{R} \to \mathbb{R} are bounded measurable functions, then

E[g(X)h(Y)]=E[g(X)]E[h(Y)].(3)E[g(X)\, h(Y)] = E[g(X)] \cdot E[h(Y)]. \tag{3}

Proof. Since P(X,Y)=PXPYP_{(X,Y)} = P_X \otimes P_Y, the Fubini–Tonelli theorem gives:

E[g(X)h(Y)]=g(x)h(y)d(PXPY)(x,y)=g(x)dPX(x)h(y)dPY(y)=E[g(X)]E[h(Y)].E[g(X) h(Y)] = \iint g(x)\, h(y)\, d(P_X \otimes P_Y)(x, y) = \int g(x)\, dP_X(x) \cdot \int h(y)\, dP_Y(y) = E[g(X)] \cdot E[h(Y)].

Taking g=h=idg = h = \operatorname{id} gives the special case most often used in practice:

E[XY]=E[X]E[Y]when X,Y are independent and integrable.(4)E[XY] = E[X] \cdot E[Y] \quad \text{when } X, Y \text{ are independent and integrable.} \tag{4}

This identity will appear in Covariance and Correlation, where it immediately implies Cov(X,Y)=0\operatorname{Cov}(X, Y) = 0 for independent variables.

Independence of functions

If XX and YY are independent and g,h:RRg, h : \mathbb{R} \to \mathbb{R} are measurable, then g(X)g(X) and h(Y)h(Y) are also independent. The key observation is that {g(X)B1}={Xg1(B1)}\{g(X) \in B_1\} = \{X \in g^{-1}(B_1)\}, so independence of XX and YY on preimage events carries over to g(X)g(X) and h(Y)h(Y).

Pairwise vs mutual independence

For a collection of three or more random variables X1,X2,,XnX_1, X_2, \ldots, X_n, there are two distinct notions:

  • Pairwise independence: every pair Xi,XjX_i, X_j with iji \neq j is independent.
  • Mutual independence: for every non-empty subset I{1,,n}I \subseteq \{1, \ldots, n\} and every collection of Borel sets (Bi)iI(B_i)_{i \in I}:
P ⁣(iI{XiBi})=iIP(XiBi).P\!\left(\bigcap_{i \in I} \{X_i \in B_i\}\right) = \prod_{i \in I} P(X_i \in B_i).

Pairwise independence does not imply mutual independence.

Counterexample. Let X1,X2X_1, X_2 be independent Bernoulli(12)\operatorname{Bernoulli}(\tfrac{1}{2}) variables and set X3=X1X2X_3 = X_1 \oplus X_2 (XOR, i.e.\ addition mod 2). Each pair is pairwise independent: for instance, P(X1=a,X3=b)=14P(X_1 = a, X_3 = b) = \tfrac{1}{4} for every a,b{0,1}a, b \in \{0,1\}, matching 1212\tfrac{1}{2} \cdot \tfrac{1}{2}. But the triple fails mutual independence because knowing both X1X_1 and X2X_2 determines X3X_3 completely:

P(X1=0,X2=0,X3=0)=14121212=18.P(X_1 = 0,\, X_2 = 0,\, X_3 = 0) = \tfrac{1}{4} \neq \tfrac{1}{2} \cdot \tfrac{1}{2} \cdot \tfrac{1}{2} = \tfrac{1}{8}.

Unless stated otherwise, “independence” for n3n \geq 3 variables always means mutual independence.

Summary

  • XX and YY are independent when P(X,Y)=PXPYP_{(X,Y)} = P_X \otimes P_Y, equivalently FX,Y(x,y)=FX(x)FY(y)F_{X,Y}(x,y) = F_X(x)\, F_Y(y).
  • For discrete variables: joint PMF factorises, pX,Y(x,y)=pX(x)pY(y)p_{X,Y}(x,y) = p_X(x)\, p_Y(y).
  • For absolutely continuous variables: joint PDF factorises, fX,Y(x,y)=fX(x)fY(y)f_{X,Y}(x,y) = f_X(x)\, f_Y(y) a.e.
  • Independence implies E[g(X)h(Y)]=E[g(X)]E[h(Y)]E[g(X)\,h(Y)] = E[g(X)]\,E[h(Y)] for bounded measurable g,hg, h; in particular E[XY]=E[X]E[Y]E[XY] = E[X]\,E[Y].
  • g(X)g(X) and h(Y)h(Y) are independent whenever XX and YY are.
  • Pairwise independence does not imply mutual independence: the XOR counterexample demonstrates this gap.