Bayes' Formula
EssentialPrerequisites
The multiplication rule gives . Bayes’ formula exploits this symmetry to invert a conditional probability: given how likely observed data is under each hypothesis , it computes how likely each hypothesis is given the data.
Setup
Let be a countable partition of with each . Think of the as competing hypotheses. You know:
- The prior probabilities — your uncertainty before observing .
- The likelihoods — how probable the observation is under each hypothesis.
Bayes’ formula computes the posterior probabilities — your updated uncertainty after observing .
Derivation
Applying the multiplication rule to in two ways:
Solving for and substituting the law of total probability for :
This is Bayes’ formula. The denominator is the normalising constant that makes the posteriors sum to .
Bayesian interpretation
Equation is often read as the proportionality
or: to update from prior to posterior, multiply each hypothesis’s prior probability by how well it predicts the observed data , then renormalise. The denominator — called the marginal likelihood or evidence — is the same for all hypotheses, so it affects only the scale.
This proportionality is the foundation of Bayesian inference: a prior belief over a parameter, multiplied by the likelihood of observed data under that parameter, yields a posterior belief. As data accumulate, the posterior concentrates around the true parameter regardless of the prior (under regularity conditions).
The diagnostic test example
A disease affects 1% of a population. A diagnostic test has:
- Sensitivity (true positive rate).
- Specificity , i.e.\ false positive rate .
A randomly chosen person tests positive. What is the probability they actually have the disease?
Applying Bayes’ formula with as the partition:
A 99%-accurate test gives only a 50% posterior for a disease with 1% prevalence. The reason: true positives ( of the population) and false positives () are equally common when the disease is rare, so the prior cancels. The base rate carries enormous weight when it is far from 50%.
Raising the prevalence to 10% (a high-risk sub-population):
The same test is far more informative in the high-risk group because the prior is less extreme.
Summary
- Bayes’ formula: — derived from the multiplication rule applied symmetrically and the law of total probability.
- Bayesian read: posterior likelihood prior; the denominator is a normalising constant shared by all hypotheses.
- Base-rate sensitivity: a highly accurate test can still give a modest posterior when the prior (prevalence) is very small — the key lesson of the diagnostic example.