Covariance and Correlation

Essential
Last updated: Tags: Probability, Random Variables, Expectation

Two random variables can be independent, positively aligned, or negatively aligned. Variance tells you how spread out a single variable is around its mean; covariance captures how much two variables move together. Its normalised form, the Pearson correlation coefficient, removes units and confines the answer to [1,1][-1, 1], making distributions with very different scales comparable.

Definition

Let XX and YY be random variables with finite second moments: E[X2]<E[X^2] < \infty and E[Y2]<E[Y^2] < \infty. Set μXE[X]\mu_X \coloneqq E[X] and μYE[Y]\mu_Y \coloneqq E[Y]. The covariance of XX and YY is

Cov(X,Y)E[(XμX)(YμY)].\operatorname{Cov}(X, Y) \coloneqq E\bigl[(X - \mu_X)(Y - \mu_Y)\bigr].

When XX and YY tend to exceed their means simultaneously, the product (XμX)(YμY)(X - \mu_X)(Y - \mu_Y) is typically positive, so Cov(X,Y)>0\operatorname{Cov}(X,Y) > 0. When one tends to be above its mean as the other is below, the product is typically negative. Note that Cov(X,X)=E[(XμX)2]=Var(X)\operatorname{Cov}(X, X) = E[(X - \mu_X)^2] = \operatorname{Var}(X), so covariance generalises variance.

Computational formula

Expanding the product and applying linearity of expectation gives a formula that avoids centring the variables first:

Cov(X,Y)=E[XY]E[X]E[Y].(1)\operatorname{Cov}(X, Y) = E[XY] - E[X]\, E[Y]. \tag{1}

Proof.

E[(XμX)(YμY)]=E[XY]μXE[Y]μYE[X]+μXμY=E[XY]μXμY.E[(X - \mu_X)(Y - \mu_Y)] = E[XY] - \mu_X E[Y] - \mu_Y E[X] + \mu_X \mu_Y = E[XY] - \mu_X \mu_Y.

Consequence of independence. If XX and YY are independent, then E[XY]=E[X]E[Y]E[XY] = E[X] E[Y], so formula (1)(1) gives Cov(X,Y)=0\operatorname{Cov}(X, Y) = 0. The converse is false — see the counterexample below.

Bilinearity and symmetry

Theorem. Covariance is symmetric and bilinear:

Cov(X,Y)=Cov(Y,X),(2)\operatorname{Cov}(X, Y) = \operatorname{Cov}(Y, X), \tag{2} Cov(aX+bZ,Y)=aCov(X,Y)+bCov(Z,Y)for a,bR.(3)\operatorname{Cov}(aX + bZ,\, Y) = a\, \operatorname{Cov}(X, Y) + b\, \operatorname{Cov}(Z, Y) \quad \text{for } a, b \in \mathbb{R}. \tag{3}

Adding a constant does not affect covariance: Cov(X+c,Y)=Cov(X,Y)\operatorname{Cov}(X + c, Y) = \operatorname{Cov}(X, Y) for any cRc \in \mathbb{R}.

Proof of (3). By the computational formula (1)(1):

Cov(aX+bZ,Y)=E[(aX+bZ)Y]E[aX+bZ]E[Y]=a(E[XY]E[X]E[Y])+b(E[ZY]E[Z]E[Y]).\operatorname{Cov}(aX + bZ, Y) = E[(aX + bZ)Y] - E[aX + bZ]\,E[Y] = a\bigl(E[XY] - E[X]E[Y]\bigr) + b\bigl(E[ZY] - E[Z]E[Y]\bigr).

Bilinearity in both arguments together means covariance is a positive semi-definite bilinear form on the space of square-integrable random variables: it is symmetric and satisfies Cov(X,X)=Var(X)0\operatorname{Cov}(X, X) = \operatorname{Var}(X) \geq 0.

Variance of a sum

The most immediate application of bilinearity is the variance-of-a-sum identity:

Var(X+Y)=Var(X)+2Cov(X,Y)+Var(Y).(4)\operatorname{Var}(X + Y) = \operatorname{Var}(X) + 2\operatorname{Cov}(X, Y) + \operatorname{Var}(Y). \tag{4}

Proof. Expand Var(X+Y)=Cov(X+Y,X+Y)\operatorname{Var}(X + Y) = \operatorname{Cov}(X+Y,\, X+Y) by bilinearity:

Cov(X+Y,X+Y)=Cov(X,X)+2Cov(X,Y)+Cov(Y,Y)=Var(X)+2Cov(X,Y)+Var(Y).\operatorname{Cov}(X+Y,\, X+Y) = \operatorname{Cov}(X,X) + 2\operatorname{Cov}(X,Y) + \operatorname{Cov}(Y,Y) = \operatorname{Var}(X) + 2\operatorname{Cov}(X,Y) + \operatorname{Var}(Y).

More generally, for S=X1++XnS = X_1 + \cdots + X_n:

Var(S)=i=1nVar(Xi)+21i<jnCov(Xi,Xj).(5)\operatorname{Var}(S) = \sum_{i=1}^n \operatorname{Var}(X_i) + 2\sum_{1 \leq i < j \leq n} \operatorname{Cov}(X_i, X_j). \tag{5}

When all pairs are uncorrelated (in particular, when they are independent), the off-diagonal terms vanish and variance is additive. This is the result stated without proof in Variance.

Pearson correlation coefficient

Covariance depends on the scales of XX and YY: multiplying XX by 22 multiplies Cov(X,Y)\operatorname{Cov}(X,Y) by 22. To get a scale-free measure, normalise by both standard deviations. The Pearson correlation coefficient is

ρ(X,Y)Cov(X,Y)σXσY,(6)\rho(X, Y) \coloneqq \frac{\operatorname{Cov}(X, Y)}{\sigma_X \sigma_Y}, \tag{6}

where σX=Var(X)>0\sigma_X = \sqrt{\operatorname{Var}(X)} > 0 and σY=Var(Y)>0\sigma_Y = \sqrt{\operatorname{Var}(Y)} > 0.

Theorem (Cauchy–Schwarz). ρ(X,Y)[1,1]\rho(X, Y) \in [-1, 1].

Proof. For any tRt \in \mathbb{R}, let X=XμXX' = X - \mu_X and Y=YμYY' = Y - \mu_Y. Then:

0Var(tX+Y)=t2σX2+2tCov(X,Y)+σY2.0 \leq \operatorname{Var}(tX' + Y') = t^2 \sigma_X^2 + 2t\operatorname{Cov}(X, Y) + \sigma_Y^2.

This quadratic in tt is non-negative for all tt, so its discriminant must be non-positive:

4Cov(X,Y)24σX2σY20,4\operatorname{Cov}(X, Y)^2 - 4\sigma_X^2 \sigma_Y^2 \leq 0,

giving Cov(X,Y)σXσY|\operatorname{Cov}(X,Y)| \leq \sigma_X \sigma_Y, i.e.\ ρ(X,Y)1|\rho(X,Y)| \leq 1.

Equality ρ=1|\rho| = 1 holds precisely when Var(tX+Y)=0\operatorname{Var}(tX' + Y') = 0 for some tt, i.e.\ when Y=tXY' = -tX' almost surely, meaning Y=aX+bY = aX + b for constants a=t0a = -t \neq 0 and b=μYaμXb = \mu_Y - a\mu_X. The sign of ρ\rho matches the sign of aa.

The Pearson coefficient measures linear association: ρ|\rho| close to 11 means YY is nearly a linear function of XX, while ρ=0\rho = 0 means no linear association (nonlinear dependence remains possible).

Zero covariance does not imply independence

Independence implies zero covariance, but the converse fails.

Counterexample. Let UUniform(1,1)U \sim \operatorname{Uniform}(-1, 1) and set V=U2V = U^2. Since UU is symmetric around 00, E[U]=0E[U] = 0 and E[U3]=0E[U^3] = 0. Therefore:

Cov(U,V)=E[UU2]E[U]E[U2]=E[U3]0=0.\operatorname{Cov}(U, V) = E[U \cdot U^2] - E[U]\,E[U^2] = E[U^3] - 0 = 0.

Yet VV is completely determined by UU, so the two variables are as far from independent as possible. Zero covariance only rules out linear dependence; nonlinear dependence is invisible to Cov\operatorname{Cov}.

Summary

  • Cov(X,Y)E[(XμX)(YμY)]=E[XY]E[X]E[Y]\operatorname{Cov}(X, Y) \coloneqq E[(X - \mu_X)(Y - \mu_Y)] = E[XY] - E[X]\,E[Y].
  • Symmetry and bilinearity: covariance is a symmetric positive semi-definite bilinear form; Cov(X,X)=Var(X)\operatorname{Cov}(X,X) = \operatorname{Var}(X).
  • Variance of a sum: Var(X+Y)=Var(X)+2Cov(X,Y)+Var(Y)\operatorname{Var}(X + Y) = \operatorname{Var}(X) + 2\operatorname{Cov}(X,Y) + \operatorname{Var}(Y); additivity holds when variables are uncorrelated.
  • Independence implies Cov(X,Y)=0\operatorname{Cov}(X,Y) = 0; the converse fails — zero covariance only rules out linear dependence.
  • Pearson correlation: ρ(X,Y)=Cov(X,Y)/(σXσY)[1,1]\rho(X,Y) = \operatorname{Cov}(X,Y) / (\sigma_X \sigma_Y) \in [-1, 1]; equality ρ=1|\rho| = 1 iff Y=aX+bY = aX + b almost surely.