Measure almost any naturally occurring quantity — adult heights, measurement errors, test scores, the average of many independent observations — and the same bell-shaped curve keeps appearing. The Normal distribution (also called the Gaussian distribution) is the universal distribution of averages, and understanding it is essential to virtually every branch of applied mathematics and statistics.
The Gaussian integral
Before defining the Normal distribution, we need one classical result: the integral
I:=∫−∞∞e−x2dx=π.
Proof via polar coordinates. Consider I2:
I2=(∫−∞∞e−x2dx)(∫−∞∞e−y2dy)=∬R2e−(x2+y2)dxdy.
Convert to polar coordinates x=rcosθ, y=rsinθ, with r≥0 and θ∈[0,2π). The Jacobian is r, and x2+y2=r2:
I2=∫02π∫0∞e−r2rdrdθ=2π∫0∞re−r2dr.
Substitute u=r2, du=2rdr:
I2=2π∫0∞21e−udu=π⋅[−e−u]0∞=π.
Since I>0, we conclude I=π. □
A useful rescaled form follows immediately: substituting x=t/2 gives
∫−∞∞e−t2/2dt=2π.
Standard Normal distribution
The standard Normal distributionZ∼N(0,1) has probability density function (PDF)
φ(x):=2π1e−x2/2,x∈R.
Verification that φ integrates to 1
∫−∞∞φ(x)dx=2π1∫−∞∞e−x2/2dx=2π1⋅2π=1,
using the Gaussian integral result above. The factor 1/2π is precisely the normalising constant.
General Normal distribution
Let μ∈R be a location parameter (mean) and σ2>0 be a scale parameter (variance). A random variableX follows a Normal distribution with mean μ and variance σ2, written X∼N(μ,σ2), if
X:=μ+σZ,Z∼N(0,1).
Equivalently, X has PDF
f(x):=2πσ21exp(−2σ2(x−μ)2),x∈R.
Verification. Substituting z=(x−μ)/σ transforms the integral ∫−∞∞f(x)dx into ∫−∞∞φ(z)dz=1.
The parameter σ:=σ2 is the standard deviation.
Mean
E[X]=E[μ+σZ]=μ+σE[Z].
By the symmetry of φ about zero, E[Z]=0 (the integrand xφ(x) is an odd function). Therefore
E[X]=μ.
Variance
We need Var(Z)=E[Z2] for the standard Normal (since E[Z]=0). Apply integration by parts with u=x and dv=xe−x2/2dx, so v=−e−x2/2:
The boundary term vanishes since xe−x2/2→0 as ∣x∣→∞. The remaining integral equals 2π, so
E[Z2]=2π1⋅2π=1.
Hence Var(Z)=1. For the general case, using X=μ+σZ and independence:
Var(X)=σ2Var(Z)=σ2.
Affine stability
Theorem. If X∼N(μ,σ2) and a,b∈R with a=0, then
aX+b∼N(aμ+b,a2σ2).
Proof. Write X=μ+σZ with Z∼N(0,1). Then
aX+b=a(μ+σZ)+b=(aμ+b)+(aσ)Z.
This is of the form μ′+σ′Z with μ′=aμ+b and σ′=aσ, so aX+b∼N(aμ+b,a2σ2). □
Corollary. Any X∼N(μ,σ2) can be standardised: (X−μ)/σ∼N(0,1).
Central Limit Theorem
The Normal distribution is not just one of many distributions — it is the universal limit of standardised sums. The Central Limit Theorem (CLT) makes this precise.
Theorem (CLT). Let X1,X2,… be independent and identically distributed (i.i.d.) random variables with mean μ and finite variance σ2>0. Define the standardised sum
Zn:=σn(X1+X2+⋯+Xn)−nμ.
Then ZndN(0,1) as n→∞: for every z∈R,
n→∞limP(Zn≤z)=Φ(z):=∫−∞zφ(t)dt.
Why this makes the Normal ubiquitous. Any observed quantity that is the aggregate effect of many small, independent contributions — measurement noise, biological traits, financial returns — is well approximated by a Normal distribution, regardless of the shape of the individual contributing distributions. The CLT is the mathematical explanation for the bell curve’s prevalence throughout science.
Summary
Z∼N(0,1) has PDF φ(x)=2π1e−x2/2; the normalising constant 1/2π follows from the Gaussian integral ∫−∞∞e−t2/2dt=2π.
X∼N(μ,σ2) has PDF f(x)=2πσ21exp(−(x−μ)2/(2σ2)) and satisfies X=μ+σZ with Z∼N(0,1).
Mean:E[X]=μ (by symmetry of the standard Normal).
Variance:Var(X)=σ2 (derived via integration by parts).
Affine stability:aX+b∼N(aμ+b,a2σ2); in particular (X−μ)/σ∼N(0,1).
Central Limit Theorem: the standardised sum of any n i.i.d. finite-variance variables converges in distribution to N(0,1), explaining the Normal’s ubiquity.