The definition of convexity says that a weighted average of two inputs maps to a value at most the weighted average of the outputs. Jensen’s inequality extends this from two points to any finite number of points — and it does so by the simplest possible argument: induction.
The theorem
Theorem (Jensen’s Inequality). Let I⊆R be an interval, f:I→R convex, n≥1. Suppose x1,…,xn∈I and λ1,…,λn≥0 with ∑i=1nλi=1. Then
f(i=1∑nλixi)≤i=1∑nλif(xi).(1)
The left side applies f to the weighted average of the xi; the right side is the weighted average of the function values.
Proof by induction
Base case n=1. Both sides equal f(x1). ✓
Base case n=2. This is exactly the definition of convexity:
f(λ1x1+λ2x2)≤λ1f(x1)+λ2f(x2) with λ1+λ2=1. ✓
Inductive step. Assume (1) holds for some n≥2; we prove it for n+1.
Let x1,…,xn+1∈I and λ1,…,λn+1≥0 with ∑i=1n+1λi=1.
If λn+1=1 then all other weights are zero and both sides equal f(xn+1). So assume λn+1<1, meaning μ:=1−λn+1>0.
Define μi:=λi/μ for i=1,…,n. Then μi≥0 and ∑i=1nμi=1, so z:=∑i=1nμixi∈I (an interval is closed under convex combinations). Rewrite the left side:
i=1∑n+1λixi=μz+λn+1xn+1,μ+λn+1=1.
Apply the two-point convexity inequality (base case n=2) to z and xn+1 with weights μ and λn+1:
f(i=1∑n+1λixi)=f(μz+λn+1xn+1)≤μf(z)+λn+1f(xn+1).
Apply the inductive hypothesis to z=∑i=1nμixi with weights μi:
f(z)≤i=1∑nμif(xi).
Combining, and substituting μi=λi/μ:
f(i=1∑n+1λixi)≤μi=1∑nμλif(xi)+λn+1f(xn+1)=i=1∑n+1λif(xi).□
Equality case
Proposition. If f is strictly convex and λi>0 for all i, then equality holds in (1) if and only if x1=x2=⋯=xn.
Proof sketch. In the inductive step the two-point inequality is strict unless z=xn+1, and the inductive hypothesis is strict unless all x1,…,xn are equal. Chasing through the induction shows that any strict step forces strict inequality in the conclusion. □
Applications
Arithmetic–geometric mean inequality. Take f(t)=−lnt (strictly convex on (0,∞)) and uniform weights λi=1/n:
−ln(nx1+⋯+xn)≤n−lnx1−⋯−lnxn=−ln(x1⋯xn)1/n.
Negating and exponentiating recovers the AM–GM inequality:
nx1+⋯+xn≥(x1⋯xn)1/n.
Log-sum inequality. Take f(t)=tlnt (convex on (0,∞)); Jensen’s inequality underlies the nonnegativity of KL divergence in information theory.
Summary
- Jensen’s Inequality: for convex f and weights λi≥0 summing to 1,
f(∑λixi)≤∑λif(xi).
- Proof: induction on n; the base case n=2 is the definition of convexity; the inductive step peels off the last point and applies the two-point inequality.
- Equality: when f is strictly convex and all weights are positive, equality holds iff all xi are equal.
- AM–GM is a direct corollary: apply Jensen to f=−ln with uniform weights.