Jensen's Inequality

Basis
Last updated: Tags: Calculus, Convexity, Inequalities

Prerequisites

The definition of convexity says that a weighted average of two inputs maps to a value at most the weighted average of the outputs. Jensen’s inequality extends this from two points to any finite number of points — and it does so by the simplest possible argument: induction.

The theorem

Theorem (Jensen’s Inequality). Let IRI \subseteq \mathbb{R} be an interval, f:IRf : I \to \mathbb{R} convex, n1n \geq 1. Suppose x1,,xnIx_1, \ldots, x_n \in I and λ1,,λn0\lambda_1, \ldots, \lambda_n \geq 0 with i=1nλi=1\sum_{i=1}^n \lambda_i = 1. Then

f ⁣(i=1nλixi)    i=1nλif(xi).(1)f\!\left(\sum_{i=1}^{n} \lambda_i x_i\right) \;\leq\; \sum_{i=1}^{n} \lambda_i f(x_i). \tag{1}

The left side applies ff to the weighted average of the xix_i; the right side is the weighted average of the function values.

Proof by induction

Base case n=1n = 1. Both sides equal f(x1)f(x_1). \checkmark

Base case n=2n = 2. This is exactly the definition of convexity: f(λ1x1+λ2x2)λ1f(x1)+λ2f(x2)f(\lambda_1 x_1 + \lambda_2 x_2) \leq \lambda_1 f(x_1) + \lambda_2 f(x_2) with λ1+λ2=1\lambda_1 + \lambda_2 = 1. \checkmark

Inductive step. Assume (1)(1) holds for some n2n \geq 2; we prove it for n+1n + 1.

Let x1,,xn+1Ix_1, \ldots, x_{n+1} \in I and λ1,,λn+10\lambda_1, \ldots, \lambda_{n+1} \geq 0 with i=1n+1λi=1\sum_{i=1}^{n+1} \lambda_i = 1.

If λn+1=1\lambda_{n+1} = 1 then all other weights are zero and both sides equal f(xn+1)f(x_{n+1}). So assume λn+1<1\lambda_{n+1} < 1, meaning μ1λn+1>0\mu \coloneqq 1 - \lambda_{n+1} > 0.

Define μiλi/μ\mu_i \coloneqq \lambda_i / \mu for i=1,,ni = 1, \ldots, n. Then μi0\mu_i \geq 0 and i=1nμi=1\sum_{i=1}^n \mu_i = 1, so zi=1nμixiIz \coloneqq \sum_{i=1}^n \mu_i x_i \in I (an interval is closed under convex combinations). Rewrite the left side:

i=1n+1λixi  =  μz+λn+1xn+1,μ+λn+1=1.\sum_{i=1}^{n+1} \lambda_i x_i \;=\; \mu z + \lambda_{n+1} x_{n+1}, \qquad \mu + \lambda_{n+1} = 1.

Apply the two-point convexity inequality (base case n=2n = 2) to zz and xn+1x_{n+1} with weights μ\mu and λn+1\lambda_{n+1}:

f ⁣(i=1n+1λixi)=f(μz+λn+1xn+1)μf(z)+λn+1f(xn+1).f\!\left(\sum_{i=1}^{n+1} \lambda_i x_i\right) = f(\mu z + \lambda_{n+1} x_{n+1}) \leq \mu f(z) + \lambda_{n+1} f(x_{n+1}).

Apply the inductive hypothesis to z=i=1nμixiz = \sum_{i=1}^n \mu_i x_i with weights μi\mu_i:

f(z)    i=1nμif(xi).f(z) \;\leq\; \sum_{i=1}^n \mu_i f(x_i).

Combining, and substituting μi=λi/μ\mu_i = \lambda_i / \mu:

f ⁣(i=1n+1λixi)    μi=1nλiμf(xi)+λn+1f(xn+1)  =  i=1n+1λif(xi).f\!\left(\sum_{i=1}^{n+1} \lambda_i x_i\right) \;\leq\; \mu \sum_{i=1}^n \frac{\lambda_i}{\mu} f(x_i) + \lambda_{n+1} f(x_{n+1}) \;=\; \sum_{i=1}^{n+1} \lambda_i f(x_i). \quad \square

Equality case

Proposition. If ff is strictly convex and λi>0\lambda_i > 0 for all ii, then equality holds in (1)(1) if and only if x1=x2==xnx_1 = x_2 = \cdots = x_n.

Proof sketch. In the inductive step the two-point inequality is strict unless z=xn+1z = x_{n+1}, and the inductive hypothesis is strict unless all x1,,xnx_1, \ldots, x_n are equal. Chasing through the induction shows that any strict step forces strict inequality in the conclusion. \square

Applications

Arithmetic–geometric mean inequality. Take f(t)=lntf(t) = -\ln t (strictly convex on (0,)(0, \infty)) and uniform weights λi=1/n\lambda_i = 1/n:

ln ⁣(x1++xnn)    lnx1lnxnn  =  ln ⁣(x1xn)1/n.-\ln\!\left(\frac{x_1 + \cdots + x_n}{n}\right) \;\leq\; \frac{-\ln x_1 - \cdots - \ln x_n}{n} \;=\; -\ln\!\left(x_1 \cdots x_n\right)^{1/n}.

Negating and exponentiating recovers the AM–GM inequality:

x1++xnn    (x1xn)1/n.\frac{x_1 + \cdots + x_n}{n} \;\geq\; (x_1 \cdots x_n)^{1/n}.

Log-sum inequality. Take f(t)=tlntf(t) = t \ln t (convex on (0,)(0,\infty)); Jensen’s inequality underlies the nonnegativity of KL divergence in information theory.

Summary

  • Jensen’s Inequality: for convex ff and weights λi0\lambda_i \geq 0 summing to 11, f ⁣(λixi)λif(xi).f\!\left(\sum \lambda_i x_i\right) \leq \sum \lambda_i f(x_i).
  • Proof: induction on nn; the base case n=2n = 2 is the definition of convexity; the inductive step peels off the last point and applies the two-point inequality.
  • Equality: when ff is strictly convex and all weights are positive, equality holds iff all xix_i are equal.
  • AM–GM is a direct corollary: apply Jensen to f=lnf = -\ln with uniform weights.