Monte Carlo Methods
Here is a strange idea that turns out to be one of the most powerful tools in all of computational
science: when a problem is too hard to solve exactly, throw random numbers at it.
Not because you have given up — but because a careful flood of randomness, counted correctly, can
measure things that no neat formula can reach. Physicists use this trick to compute the energy of a
magnet, the price of a financial option, the path of a neutron through a reactor, and integrals in
hundreds of dimensions that would defeat any pen, any grid, any supercomputer running the "sensible"
way. The family of tricks is called the Monte Carlo method, after the famous casino —
because at its heart it is gambling, done on purpose, for answers.
The whole idea fits in one sentence, worth reading twice: if you can't calculate a quantity,
sometimes you can measure it by random sampling. Below we build the method from the simplest
possible game — measuring an area by throwing darts — sharpen it into a way of doing integrals,
discover the one magic property that makes it beat every rival in high dimensions, and finish with the
Metropolis algorithm, the engine that lets a computer wander through the states of a
physical system exactly as often as nature itself would visit them.
Throwing darts to measure π
Picture a square dartboard, one metre on each side, with a quarter-circle of radius 1
drawn inside it, centred on one corner. Now throw darts at the square completely at random — blindfolded,
so every spot is equally likely. Some land inside the quarter-circle; some land in the corner outside it.
Here is the key: the fraction that land inside the curve equals the fraction of the square's
area the quarter-circle covers.
The square has area 1. The quarter-circle has area
\tfrac14 \pi r^2 = \tfrac{\pi}{4}. So the fraction of darts landing inside
should settle down towards \pi/4 \approx 0.785. Turn that around and you have a
way to measure \pi by throwing darts:
\frac{\text{darts inside}}{\text{darts thrown}} \;\approx\; \frac{\pi}{4}
\quad\Longrightarrow\quad \pi \;\approx\; 4 \times \frac{\text{darts inside}}{\text{darts thrown}}.
A point (x, y) with 0 \le x, y \le 1 lands inside
the quarter-circle exactly when x^2 + y^2 \le 1 — it is closer than one unit
to the corner. That single test, x^2 + y^2 \le 1, is the whole physics of the
game. Watch it converge in code:
// Estimate π by sampling random points in the unit square and
// counting how many fall inside the quarter-circle (x² + y² ≤ 1).
// A tiny seeded PRNG (mulberry32) makes the run reproducible.
function mulberry32(seed: number) {
let a = seed >>> 0;
return function (): number {
a = (a + 0x6d2b79f5) | 0;
let t = Math.imul(a ^ (a >>> 15), 1 | a);
t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
};
}
const rand = mulberry32(12345);
let inside = 0;
const Ns = [100, 1000, 10000, 100000];
let thrown = 0;
let checkpoint = 0;
for (let i = 1; i <= 100000; i++) {
const x = rand();
const y = rand();
if (x * x + y * y <= 1) inside++;
thrown++;
if (thrown === Ns[checkpoint]) {
const estimate = (4 * inside) / thrown;
const error = Math.abs(estimate - Math.PI);
console.log(
`N=${thrown.toString().padStart(6)} π≈${estimate.toFixed(5)} error=${error.toFixed(5)}`,
);
checkpoint++;
}
}
console.log(`true π = ${Math.PI.toFixed(5)}`);
Run it a few times (change the seed) and you will see the estimate wobble towards
3.14159\ldots, tightening as N grows — but
slowly. Look at the error column: going from N = 100 to
N = 10{,}000 (a hundredfold more darts) only shrinks it by roughly ten. That
is not a coincidence; it is the signature of Monte Carlo, and we will pin it down shortly.
From darts to integrals
Measuring an area is really measuring an integral in disguise — the area under a curve
is \int f\,dx. So the dart trick generalises into a way of computing
any integral, and this is where Monte Carlo earns its living. The idea rests on a plain fact
about averages. The average value of a function f over a
region of volume V is
\langle f \rangle = \frac{1}{V}\int_V f\,dV
\quad\Longrightarrow\quad \int_V f\,dV = V\,\langle f \rangle.
You usually can't do the integral — but you can estimate the average: scatter
N random points across the region, evaluate f at
each, and take the ordinary mean. That estimated average, times the volume, estimates the integral:
-
The estimator. For N points
x_1,\ldots,x_N drawn uniformly from a region of volume
V,
\int_V f\,dV \;\approx\; \frac{V}{N}\sum_{i=1}^{N} f(x_i).
-
The error falls like 1/\sqrt{N}. The typical error of
the estimate is proportional to \sigma/\sqrt{N}, where
\sigma is the spread of f's values. To halve
the error you need four times as many samples.
-
The dimension does not appear. That 1/\sqrt{N} is the
same whether the integral is over a line, a plane, or a thousand-dimensional space. This is the
method's superpower.
Let's test it on a concrete integral that has no elementary antiderivative — the bell-shaped
\int_0^1 e^{-x^2}\,dx. Its true value is about
0.746824. Since the interval [0,1] has
"volume" V = 1, the estimate is simply the average of
e^{-x^2} at random points:
// Monte Carlo integration of ∫₀¹ e^(−x²) dx (no elementary antiderivative).
// Estimate = V · ⟨f⟩ with V = 1, so it's just the mean of f at random x.
function mulberry32(seed: number) {
let a = seed >>> 0;
return function (): number {
a = (a + 0x6d2b79f5) | 0;
let t = Math.imul(a ^ (a >>> 15), 1 | a);
t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
};
}
const rand = mulberry32(2024);
const f = (x: number): number => Math.exp(-x * x);
const TRUE = 0.7468241328; // reference value
for (const N of [100, 1000, 10000, 50000]) {
let sum = 0;
let sumSq = 0;
for (let i = 0; i < N; i++) {
const y = f(rand());
sum += y;
sumSq += y * y;
}
const mean = sum / N; // ⟨f⟩ (V = 1)
const variance = sumSq / N - mean * mean; // spread of f values
const stdErr = Math.sqrt(variance / N); // predicted ~1/√N error
console.log(
`N=${N.toString().padStart(5)} estimate=${mean.toFixed(5)}` +
` actual error=${Math.abs(mean - TRUE).toFixed(5)}` +
` predicted ±${stdErr.toFixed(5)}`,
);
}
console.log(`true value ≈ ${TRUE.toFixed(5)}`);
Notice the third column (the actual error) and the fourth (the error the
1/\sqrt{N} rule predicts from the spread of the samples) stay in the
same ballpark — Monte Carlo not only gives you an answer, it hands you an honest estimate of its own
uncertainty, for free, from the scatter of the very samples you already drew.
Why 1/\sqrt{N} is a superpower
A slow-shrinking error sounds like bad news, and in one dimension it is: to integrate a
smooth function on a line, the old-fashioned way — chop the interval into N
strips and add up rectangles (a Riemann sum, "grid quadrature") — has an error that falls like
1/N or even 1/N^2, far faster than Monte Carlo's
limp 1/\sqrt{N}. In 1-D, don't gamble; use a grid.
But watch what happens as the dimension climbs. To keep the same grid spacing in
d dimensions you need N \sim m^d points — a grid of
m steps per axis. Ten points per axis is 10 in 1-D,
100 in 2-D, and 10^{30} in
30 dimensions — more grid points than there are atoms in a mountain. The grid
error, expressed in the number of samples N, degrades to
N^{-k/d}, which crawls to a standstill as d grows.
This is the notorious curse of dimensionality.
Monte Carlo simply does not care. Its error is \sigma/\sqrt{N} in
every dimension — the d never appears. So there is a
crossover: below a handful of dimensions grids win, but somewhere around
d \approx 4 to 8 Monte Carlo pulls ahead, and by
the time you reach the integrals physics actually needs it is the only game in town.
Statistical mechanics is the classic customer: the properties of a material come from integrals over the
phase space of every particle's position and momentum — easily
10^{23} coupled dimensions. No grid can touch that. Random sampling can.
The method was born in secret. In the 1940s at Los Alamos, working on the atomic bomb, the mathematician
Stanislaw Ulam was recovering from illness and playing endless games of solitaire. He
wondered what the odds of winning were — and realised that instead of untangling the combinatorics, he
could just deal many random hands and count. He mentioned it to John von Neumann,
who saw at once that the same idea could trace neutrons diffusing through fissile material, a problem no
formula could crack. They needed a secret code-name for the classified technique, and their colleague
Nicholas Metropolis suggested "Monte Carlo", after the Monaco casino where Ulam's uncle
would famously borrow money to gamble. The name — and the method — stuck. The very first large-scale runs
were done on ENIAC, one of the first electronic computers, quietly launching an entire
branch of computational science.
The Metropolis algorithm and the Ising model
Plain "throw uniform darts" sampling works when every point is equally worth visiting. But in physics
that is rarely true. At temperature T, a system in state with energy
E occurs with the Boltzmann probability
P \propto e^{-E/k_BT} — low-energy states are common, high-energy states rare.
If you want the average energy or magnetisation of a magnet, you must sample states weighted by that
distribution, and almost all of the weight sits in a vanishingly small, oddly-shaped sliver of a
gigantic space. Scattering uniform darts would waste essentially all of them on states nature never
visits.
The Metropolis algorithm (1953) solves this with breathtaking simplicity. Instead of
placing independent points, take a walk: from the current state, propose a small random change,
and decide whether to accept it by a single rule:
-
Propose. From the current configuration, propose a small change (e.g. flip one
spin), and compute the energy change \Delta E it would cause.
-
Accept downhill moves always. If \Delta E \le 0 (energy
drops or stays), accept the change — the acceptance probability is
1.
-
Accept uphill moves sometimes. If \Delta E > 0, accept
it with probability e^{-\Delta E/k_BT} — draw a random number in
[0,1) and accept if it is below that. Otherwise stay put.
Do this over and over and the walk visits each configuration exactly as often as its Boltzmann weight
demands — it samples the Boltzmann distribution without ever computing the impossible
normalising sum. The reason it works is detailed balance: the accept-rule is arranged
so that, in equilibrium, the flow of the walk from state A to state
B exactly matches the flow back from B to
A. When every pair of states balances like this, the population sitting in
each state stops changing — and that steady population is precisely
P \propto e^{-E/k_BT}. Accepting occasional uphill moves is what lets the walk
climb out of local dips and explore, rather than freezing at the first minimum it finds.
The classic playground is the Ising model of a magnet: a grid of little arrows
("spins") each pointing up (s_i = +1) or down
(s_i = -1). Neighbouring spins prefer to line up, so the energy is
E = -J \sum_{\langle i,j\rangle} s_i\,s_j,
summed over neighbouring pairs, with J > 0 the coupling strength: aligned
neighbours (s_i s_j = +1) lower the energy, clashing ones raise it. Now the
drama is a tug-of-war between two tendencies. Energy wants order — all spins aligned,
a magnet. Temperature wants chaos — random thermal kicks scramble the alignment. At low
T order wins and the material is ferromagnetic (a strong net
magnetisation); at high T chaos wins and the magnetisation averages to zero.
Between them, at a sharp critical temperature T_c, the
magnet abruptly loses its magnetism — a genuine phase transition, of exactly the kind you can
watch a fridge magnet undergo if you heat it past its Curie point. Let Metropolis play it out:
// A tiny 2-D Ising magnet on an 8×8 grid (64 spins), driven by Metropolis.
// Low temperature → spins align → magnetisation near 1.
// High temperature → thermal chaos → magnetisation near 0.
function mulberry32(seed: number) {
let a = seed >>> 0;
return function (): number {
a = (a + 0x6d2b79f5) | 0;
let t = Math.imul(a ^ (a >>> 15), 1 | a);
t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t;
return ((t ^ (t >>> 14)) >>> 0) / 4294967296;
};
}
const L = 8; // 8×8 lattice = 64 spins
const N = L * L;
const J = 1;
function runIsing(T: number, seed: number): number {
const rand = mulberry32(seed);
// Start fully aligned (all spins up).
const s: number[] = new Array(N).fill(1);
const idx = (r: number, c: number): number => ((r + L) % L) * L + ((c + L) % L);
const sweeps = 400; // 400 sweeps × 64 sites ≈ 25k proposals — fast
for (let sweep = 0; sweep < sweeps; sweep++) {
for (let k = 0; k < N; k++) {
const r = Math.floor(rand() * L);
const c = Math.floor(rand() * L);
const spin = s[idx(r, c)];
// Sum of the four neighbours (periodic wrap-around).
const nb = s[idx(r + 1, c)] + s[idx(r - 1, c)] + s[idx(r, c + 1)] + s[idx(r, c - 1)];
const dE = 2 * J * spin * nb; // energy cost of flipping this spin
// Metropolis: accept if downhill, else with prob e^(−ΔE/T).
if (dE <= 0 || rand() < Math.exp(-dE / T)) {
s[idx(r, c)] = -spin;
}
}
}
let sum = 0;
for (const v of s) sum += v;
return Math.abs(sum) / N; // |magnetisation| per spin, 0…1
}
// The 2-D Ising critical temperature is about T_c ≈ 2.27 (in units of J/k_B).
for (const T of [1.0, 1.5, 2.0, 2.5, 3.5, 5.0]) {
const m = runIsing(T, 777);
const bar = "#".repeat(Math.round(m * 20));
console.log(`T=${T.toFixed(1)} |m|=${m.toFixed(3)} ${bar}`);
}
Read the bars from top to bottom: at low temperature the magnetisation
|m| sits near 1 (ordered, magnetic), and as the
temperature climbs past T_c \approx 2.27 it collapses towards
0 (disordered, non-magnetic). You have just simulated a phase transition with
a few dozen lines of code and a stream of random numbers — the same technique, scaled up, computes the
properties of real materials.
The single most common Monte Carlo mistake is expecting the error to fall in step with the effort. It
does not: the error shrinks like 1/\sqrt{N}, so
100 times the samples buys only 10 times the accuracy, and
10{,}000 times the samples just 100 times. Chasing one more decimal
place of \pi this way costs a hundredfold more darts each time — a
brutal law of diminishing returns. This is why real work leans on variance reduction
(importance sampling, control variates) rather than brute force: shrink \sigma,
not just grow N.
Two subtler traps bite in the Metropolis world specifically. First, the samples are
correlated: each step of the walk is a small nudge from the last, so successive
configurations are not independent, and the naive 1/\sqrt{N} is too
optimistic unless you account for the autocorrelation time between genuinely independent
samples. Second, burn-in: the walk starts in whatever state you chose, which may be
wildly untypical, so you must discard the first stretch of the walk before it has "forgotten" its start
and settled into equilibrium. Measure too early and you bias every average. And underneath it all: a
bad random-number generator — one with hidden patterns or a short period — can quietly
poison the results, which is why physicists fuss so much over their PRNGs.