Two random variables can be independent, positively aligned, or negatively aligned. Variance tells you how spread out a single variable is around its mean; covariance captures how much two variables move together. Its normalised form, the Pearson correlation coefficient, removes units and confines the answer to [−1,1], making distributions with very different scales comparable.
Definition
Let X and Y be random variables with finite second moments: E[X2]<∞ and E[Y2]<∞. Set μX:=E[X] and μY:=E[Y]. The covariance of X and Y is
Cov(X,Y):=E[(X−μX)(Y−μY)].
When X and Y tend to exceed their means simultaneously, the product (X−μX)(Y−μY) is typically positive, so Cov(X,Y)>0. When one tends to be above its mean as the other is below, the product is typically negative. Note that Cov(X,X)=E[(X−μX)2]=Var(X), so covariance generalises variance.
Computational formula
Expanding the product and applying linearity of expectation gives a formula that avoids centring the variables first:
Consequence of independence. If X and Y are independent, then E[XY]=E[X]E[Y], so formula (1) gives Cov(X,Y)=0. The converse is false — see the counterexample below.
Bilinearity in both arguments together means covariance is a positive semi-definite bilinear form on the space of square-integrable random variables: it is symmetric and satisfies Cov(X,X)=Var(X)≥0.
Variance of a sum
The most immediate application of bilinearity is the variance-of-a-sum identity:
Var(X+Y)=Var(X)+2Cov(X,Y)+Var(Y).(4)
Proof. Expand Var(X+Y)=Cov(X+Y,X+Y) by bilinearity:
When all pairs are uncorrelated (in particular, when they are independent), the off-diagonal terms vanish and variance is additive. This is the result stated without proof in Variance.
Pearson correlation coefficient
Covariance depends on the scales of X and Y: multiplying X by 2 multiplies Cov(X,Y) by 2. To get a scale-free measure, normalise by both standard deviations. The Pearson correlation coefficient is
ρ(X,Y):=σXσYCov(X,Y),(6)
where σX=Var(X)>0 and σY=Var(Y)>0.
Theorem (Cauchy–Schwarz).ρ(X,Y)∈[−1,1].
Proof. For any t∈R, let X′=X−μX and Y′=Y−μY. Then:
0≤Var(tX′+Y′)=t2σX2+2tCov(X,Y)+σY2.
This quadratic in t is non-negative for all t, so its discriminant must be non-positive:
4Cov(X,Y)2−4σX2σY2≤0,
giving ∣Cov(X,Y)∣≤σXσY, i.e.\ ∣ρ(X,Y)∣≤1.
Equality ∣ρ∣=1 holds precisely when Var(tX′+Y′)=0 for some t, i.e.\ when Y′=−tX′ almost surely, meaning Y=aX+b for constants a=−t=0 and b=μY−aμX. The sign of ρ matches the sign of a.
The Pearson coefficient measures linear association: ∣ρ∣ close to 1 means Y is nearly a linear function of X, while ρ=0 means no linear association (nonlinear dependence remains possible).
Zero covariance does not imply independence
Independence implies zero covariance, but the converse fails.
Counterexample. Let U∼Uniform(−1,1) and set V=U2. Since U is symmetric around 0, E[U]=0 and E[U3]=0. Therefore:
Cov(U,V)=E[U⋅U2]−E[U]E[U2]=E[U3]−0=0.
Yet V is completely determined by U, so the two variables are as far from independent as possible. Zero covariance only rules out linear dependence; nonlinear dependence is invisible to Cov.
Summary
Cov(X,Y):=E[(X−μX)(Y−μY)]=E[XY]−E[X]E[Y].
Symmetry and bilinearity: covariance is a symmetric positive semi-definite bilinear form; Cov(X,X)=Var(X).
Variance of a sum: Var(X+Y)=Var(X)+2Cov(X,Y)+Var(Y); additivity holds when variables are uncorrelated.
Independence implies Cov(X,Y)=0; the converse fails — zero covariance only rules out linear dependence.
Pearson correlation: ρ(X,Y)=Cov(X,Y)/(σXσY)∈[−1,1]; equality ∣ρ∣=1 iff Y=aX+b almost surely.