Independence is the cleanest possible relationship between two random variables: knowing the value of one tells you nothing about the other. The formal definition translates this intuition into a factorisation condition on the joint distribution, and from it the entire theory of independent sums, product expectations, and variance additivity follows.
Definition
Two random variables X and Y are independent if for every pair of Borel sets B1,B2∈B(R):
P(X∈B1,Y∈B2)=P(X∈B1)⋅P(X∈B2).
Equivalently, the joint distribution P(X,Y) is the product measure PX⊗PY.
Because B(R2) is generated by rectangles B1×B2, the product-measure condition on rectangles extends to the entire joint law.
In terms of the CDF
An equivalent characterisation: X and Y are independent if and only if
FX,Y(x,y)=FX(x)⋅FY(y)for all (x,y)∈R2.(1)
This is often the most convenient form to check.
Discrete case
For jointly discrete (X,Y), independence is equivalent to the joint PMF factorising:
pX,Y(x,y)=pX(x)⋅pY(y)for all (x,y).
Absolutely continuous case
For jointly absolutely continuous (X,Y), independence is equivalent to the joint PDF factorising:
fX,Y(x,y)=fX(x)⋅fY(y)for almost every (x,y).(2)
Example. The uniform distribution on the unit square has fX,Y(x,y)=1 on [0,1]2, and the marginals are both fX(x)=1, fY(y)=1 on [0,1]. Since 1=1⋅1, the PDF factorises and X,Y are independent.
Contrast this with the uniform distribution on the unit disk {x2+y2≤1}: the density is 1/π inside the disk and 0 outside. The marginal of X has density fX(x)=π21−x2 for x∈[−1,1], which is not a constant — so the joint density does not factorise, and X and Y are not independent.
Independence implies factorisation of expectations
Theorem. If X and Y are independent and g,h:R→R are bounded measurable functions, then
E[g(X)h(Y)]=E[g(X)]⋅E[h(Y)].(3)
Proof. Since P(X,Y)=PX⊗PY, the Fubini–Tonelli theorem gives:
Taking g=h=id gives the special case most often used in practice:
E[XY]=E[X]⋅E[Y]when X,Y are independent and integrable.(4)
This identity will appear in Covariance and Correlation, where it immediately implies Cov(X,Y)=0 for independent variables.
Independence of functions
If X and Y are independent and g,h:R→R are measurable, then g(X) and h(Y) are also independent. The key observation is that {g(X)∈B1}={X∈g−1(B1)}, so independence of X and Y on preimage events carries over to g(X) and h(Y).
Pairwise vs mutual independence
For a collection of three or more random variables X1,X2,…,Xn, there are two distinct notions:
Pairwise independence: every pair Xi,Xj with i=j is independent.
Mutual independence: for every non-empty subset I⊆{1,…,n} and every collection of Borel sets (Bi)i∈I:
P(i∈I⋂{Xi∈Bi})=i∈I∏P(Xi∈Bi).
Pairwise independence does not imply mutual independence.
Counterexample. Let X1,X2 be independent Bernoulli(21) variables and set X3=X1⊕X2 (XOR, i.e.\ addition mod 2). Each pair is pairwise independent: for instance, P(X1=a,X3=b)=41 for every a,b∈{0,1}, matching 21⋅21. But the triple fails mutual independence because knowing both X1 and X2 determines X3 completely:
P(X1=0,X2=0,X3=0)=41=21⋅21⋅21=81.
Unless stated otherwise, “independence” for n≥3 variables always means mutual independence.
Summary
X and Y are independent when P(X,Y)=PX⊗PY, equivalently FX,Y(x,y)=FX(x)FY(y).
For discrete variables: joint PMF factorises, pX,Y(x,y)=pX(x)pY(y).
For absolutely continuous variables: joint PDF factorises, fX,Y(x,y)=fX(x)fY(y) a.e.
Independence implies E[g(X)h(Y)]=E[g(X)]E[h(Y)] for bounded measurable g,h; in particular E[XY]=E[X]E[Y].
g(X) and h(Y) are independent whenever X and Y are.
Pairwise independence does not imply mutual independence: the XOR counterexample demonstrates this gap.