Monte Carlo Methods

Here is a strange idea that turns out to be one of the most powerful tools in all of computational science: when a problem is too hard to solve exactly, throw random numbers at it. Not because you have given up — but because a careful flood of randomness, counted correctly, can measure things that no neat formula can reach. Physicists use this trick to compute the energy of a magnet, the price of a financial option, the path of a neutron through a reactor, and integrals in hundreds of dimensions that would defeat any pen, any grid, any supercomputer running the "sensible" way. The family of tricks is called the Monte Carlo method, after the famous casino — because at its heart it is gambling, done on purpose, for answers.

The whole idea fits in one sentence, worth reading twice: if you can't calculate a quantity, sometimes you can measure it by random sampling. Below we build the method from the simplest possible game — measuring an area by throwing darts — sharpen it into a way of doing integrals, discover the one magic property that makes it beat every rival in high dimensions, and finish with the Metropolis algorithm, the engine that lets a computer wander through the states of a physical system exactly as often as nature itself would visit them.

Throwing darts to measure π

Picture a square dartboard, one metre on each side, with a quarter-circle of radius 1 drawn inside it, centred on one corner. Now throw darts at the square completely at random — blindfolded, so every spot is equally likely. Some land inside the quarter-circle; some land in the corner outside it. Here is the key: the fraction that land inside the curve equals the fraction of the square's area the quarter-circle covers.

The square has area 1. The quarter-circle has area \tfrac14 \pi r^2 = \tfrac{\pi}{4}. So the fraction of darts landing inside should settle down towards \pi/4 \approx 0.785. Turn that around and you have a way to measure \pi by throwing darts:

\frac{\text{darts inside}}{\text{darts thrown}} \;\approx\; \frac{\pi}{4} \quad\Longrightarrow\quad \pi \;\approx\; 4 \times \frac{\text{darts inside}}{\text{darts thrown}}.

A point (x, y) with 0 \le x, y \le 1 lands inside the quarter-circle exactly when x^2 + y^2 \le 1 — it is closer than one unit to the corner. That single test, x^2 + y^2 \le 1, is the whole physics of the game. Watch it converge in code:

// Estimate π by sampling random points in the unit square and // counting how many fall inside the quarter-circle (x² + y² ≤ 1). // A tiny seeded PRNG (mulberry32) makes the run reproducible. function mulberry32(seed: number) { let a = seed >>> 0; return function (): number { a = (a + 0x6d2b79f5) | 0; let t = Math.imul(a ^ (a >>> 15), 1 | a); t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t; return ((t ^ (t >>> 14)) >>> 0) / 4294967296; }; } const rand = mulberry32(12345); let inside = 0; const Ns = [100, 1000, 10000, 100000]; let thrown = 0; let checkpoint = 0; for (let i = 1; i <= 100000; i++) { const x = rand(); const y = rand(); if (x * x + y * y <= 1) inside++; thrown++; if (thrown === Ns[checkpoint]) { const estimate = (4 * inside) / thrown; const error = Math.abs(estimate - Math.PI); console.log( `N=${thrown.toString().padStart(6)} π≈${estimate.toFixed(5)} error=${error.toFixed(5)}`, ); checkpoint++; } } console.log(`true π = ${Math.PI.toFixed(5)}`);

Run it a few times (change the seed) and you will see the estimate wobble towards 3.14159\ldots, tightening as N grows — but slowly. Look at the error column: going from N = 100 to N = 10{,}000 (a hundredfold more darts) only shrinks it by roughly ten. That is not a coincidence; it is the signature of Monte Carlo, and we will pin it down shortly.

From darts to integrals

Measuring an area is really measuring an integral in disguise — the area under a curve is \int f\,dx. So the dart trick generalises into a way of computing any integral, and this is where Monte Carlo earns its living. The idea rests on a plain fact about averages. The average value of a function f over a region of volume V is

\langle f \rangle = \frac{1}{V}\int_V f\,dV \quad\Longrightarrow\quad \int_V f\,dV = V\,\langle f \rangle.

You usually can't do the integral — but you can estimate the average: scatter N random points across the region, evaluate f at each, and take the ordinary mean. That estimated average, times the volume, estimates the integral:

The estimator. For N points x_1,\ldots,x_N drawn uniformly from a region of volume V, \int_V f\,dV \;\approx\; \frac{V}{N}\sum_{i=1}^{N} f(x_i).
The error falls like 1/\sqrt{N}. The typical error of the estimate is proportional to \sigma/\sqrt{N}, where \sigma is the spread of f's values. To halve the error you need four times as many samples.
The dimension does not appear. That 1/\sqrt{N} is the same whether the integral is over a line, a plane, or a thousand-dimensional space. This is the method's superpower.

Let's test it on a concrete integral that has no elementary antiderivative — the bell-shaped \int_0^1 e^{-x^2}\,dx. Its true value is about 0.746824. Since the interval [0,1] has "volume" V = 1, the estimate is simply the average of e^{-x^2} at random points:

// Monte Carlo integration of ∫₀¹ e^(−x²) dx (no elementary antiderivative). // Estimate = V · ⟨f⟩ with V = 1, so it's just the mean of f at random x. function mulberry32(seed: number) { let a = seed >>> 0; return function (): number { a = (a + 0x6d2b79f5) | 0; let t = Math.imul(a ^ (a >>> 15), 1 | a); t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t; return ((t ^ (t >>> 14)) >>> 0) / 4294967296; }; } const rand = mulberry32(2024); const f = (x: number): number => Math.exp(-x * x); const TRUE = 0.7468241328; // reference value for (const N of [100, 1000, 10000, 50000]) { let sum = 0; let sumSq = 0; for (let i = 0; i < N; i++) { const y = f(rand()); sum += y; sumSq += y * y; } const mean = sum / N; // ⟨f⟩ (V = 1) const variance = sumSq / N - mean * mean; // spread of f values const stdErr = Math.sqrt(variance / N); // predicted ~1/√N error console.log( `N=${N.toString().padStart(5)} estimate=${mean.toFixed(5)}` + ` actual error=${Math.abs(mean - TRUE).toFixed(5)}` + ` predicted ±${stdErr.toFixed(5)}`, ); } console.log(`true value ≈ ${TRUE.toFixed(5)}`);

Notice the third column (the actual error) and the fourth (the error the 1/\sqrt{N} rule predicts from the spread of the samples) stay in the same ballpark — Monte Carlo not only gives you an answer, it hands you an honest estimate of its own uncertainty, for free, from the scatter of the very samples you already drew.

Why 1/\sqrt{N} is a superpower

A slow-shrinking error sounds like bad news, and in one dimension it is: to integrate a smooth function on a line, the old-fashioned way — chop the interval into N strips and add up rectangles (a Riemann sum, "grid quadrature") — has an error that falls like 1/N or even 1/N^2, far faster than Monte Carlo's limp 1/\sqrt{N}. In 1-D, don't gamble; use a grid.

But watch what happens as the dimension climbs. To keep the same grid spacing in d dimensions you need N \sim m^d points — a grid of m steps per axis. Ten points per axis is 10 in 1-D, 100 in 2-D, and 10^{30} in 30 dimensions — more grid points than there are atoms in a mountain. The grid error, expressed in the number of samples N, degrades to N^{-k/d}, which crawls to a standstill as d grows. This is the notorious curse of dimensionality.

Monte Carlo simply does not care. Its error is \sigma/\sqrt{N} in every dimension — the d never appears. So there is a crossover: below a handful of dimensions grids win, but somewhere around d \approx 4 to 8 Monte Carlo pulls ahead, and by the time you reach the integrals physics actually needs it is the only game in town. Statistical mechanics is the classic customer: the properties of a material come from integrals over the phase space of every particle's position and momentum — easily 10^{23} coupled dimensions. No grid can touch that. Random sampling can.

The method was born in secret. In the 1940s at Los Alamos, working on the atomic bomb, the mathematician Stanislaw Ulam was recovering from illness and playing endless games of solitaire. He wondered what the odds of winning were — and realised that instead of untangling the combinatorics, he could just deal many random hands and count. He mentioned it to John von Neumann, who saw at once that the same idea could trace neutrons diffusing through fissile material, a problem no formula could crack. They needed a secret code-name for the classified technique, and their colleague Nicholas Metropolis suggested "Monte Carlo", after the Monaco casino where Ulam's uncle would famously borrow money to gamble. The name — and the method — stuck. The very first large-scale runs were done on ENIAC, one of the first electronic computers, quietly launching an entire branch of computational science.

The Metropolis algorithm and the Ising model

Plain "throw uniform darts" sampling works when every point is equally worth visiting. But in physics that is rarely true. At temperature T, a system in state with energy E occurs with the Boltzmann probability P \propto e^{-E/k_BT} — low-energy states are common, high-energy states rare. If you want the average energy or magnetisation of a magnet, you must sample states weighted by that distribution, and almost all of the weight sits in a vanishingly small, oddly-shaped sliver of a gigantic space. Scattering uniform darts would waste essentially all of them on states nature never visits.

The Metropolis algorithm (1953) solves this with breathtaking simplicity. Instead of placing independent points, take a walk: from the current state, propose a small random change, and decide whether to accept it by a single rule:

Propose. From the current configuration, propose a small change (e.g. flip one spin), and compute the energy change \Delta E it would cause.
Accept downhill moves always. If \Delta E \le 0 (energy drops or stays), accept the change — the acceptance probability is 1.
Accept uphill moves sometimes. If \Delta E > 0, accept it with probability e^{-\Delta E/k_BT} — draw a random number in [0,1) and accept if it is below that. Otherwise stay put.

Do this over and over and the walk visits each configuration exactly as often as its Boltzmann weight demands — it samples the Boltzmann distribution without ever computing the impossible normalising sum. The reason it works is detailed balance: the accept-rule is arranged so that, in equilibrium, the flow of the walk from state A to state B exactly matches the flow back from B to A. When every pair of states balances like this, the population sitting in each state stops changing — and that steady population is precisely P \propto e^{-E/k_BT}. Accepting occasional uphill moves is what lets the walk climb out of local dips and explore, rather than freezing at the first minimum it finds.

The classic playground is the Ising model of a magnet: a grid of little arrows ("spins") each pointing up (s_i = +1) or down (s_i = -1). Neighbouring spins prefer to line up, so the energy is

E = -J \sum_{\langle i,j\rangle} s_i\,s_j,

summed over neighbouring pairs, with J > 0 the coupling strength: aligned neighbours (s_i s_j = +1) lower the energy, clashing ones raise it. Now the drama is a tug-of-war between two tendencies. Energy wants order — all spins aligned, a magnet. Temperature wants chaos — random thermal kicks scramble the alignment. At low T order wins and the material is ferromagnetic (a strong net magnetisation); at high T chaos wins and the magnetisation averages to zero. Between them, at a sharp critical temperature T_c, the magnet abruptly loses its magnetism — a genuine phase transition, of exactly the kind you can watch a fridge magnet undergo if you heat it past its Curie point. Let Metropolis play it out:

// A tiny 2-D Ising magnet on an 8×8 grid (64 spins), driven by Metropolis. // Low temperature → spins align → magnetisation near 1. // High temperature → thermal chaos → magnetisation near 0. function mulberry32(seed: number) { let a = seed >>> 0; return function (): number { a = (a + 0x6d2b79f5) | 0; let t = Math.imul(a ^ (a >>> 15), 1 | a); t = (t + Math.imul(t ^ (t >>> 7), 61 | t)) ^ t; return ((t ^ (t >>> 14)) >>> 0) / 4294967296; }; } const L = 8; // 8×8 lattice = 64 spins const N = L * L; const J = 1; function runIsing(T: number, seed: number): number { const rand = mulberry32(seed); // Start fully aligned (all spins up). const s: number[] = new Array(N).fill(1); const idx = (r: number, c: number): number => ((r + L) % L) * L + ((c + L) % L); const sweeps = 400; // 400 sweeps × 64 sites ≈ 25k proposals — fast for (let sweep = 0; sweep < sweeps; sweep++) { for (let k = 0; k < N; k++) { const r = Math.floor(rand() * L); const c = Math.floor(rand() * L); const spin = s[idx(r, c)]; // Sum of the four neighbours (periodic wrap-around). const nb = s[idx(r + 1, c)] + s[idx(r - 1, c)] + s[idx(r, c + 1)] + s[idx(r, c - 1)]; const dE = 2 * J * spin * nb; // energy cost of flipping this spin // Metropolis: accept if downhill, else with prob e^(−ΔE/T). if (dE <= 0 || rand() < Math.exp(-dE / T)) { s[idx(r, c)] = -spin; } } } let sum = 0; for (const v of s) sum += v; return Math.abs(sum) / N; // |magnetisation| per spin, 0…1 } // The 2-D Ising critical temperature is about T_c ≈ 2.27 (in units of J/k_B). for (const T of [1.0, 1.5, 2.0, 2.5, 3.5, 5.0]) { const m = runIsing(T, 777); const bar = "#".repeat(Math.round(m * 20)); console.log(`T=${T.toFixed(1)} |m|=${m.toFixed(3)} ${bar}`); }

Read the bars from top to bottom: at low temperature the magnetisation |m| sits near 1 (ordered, magnetic), and as the temperature climbs past T_c \approx 2.27 it collapses towards 0 (disordered, non-magnetic). You have just simulated a phase transition with a few dozen lines of code and a stream of random numbers — the same technique, scaled up, computes the properties of real materials.

The single most common Monte Carlo mistake is expecting the error to fall in step with the effort. It does not: the error shrinks like 1/\sqrt{N}, so 100 times the samples buys only 10 times the accuracy, and 10{,}000 times the samples just 100 times. Chasing one more decimal place of \pi this way costs a hundredfold more darts each time — a brutal law of diminishing returns. This is why real work leans on variance reduction (importance sampling, control variates) rather than brute force: shrink \sigma, not just grow N.

Two subtler traps bite in the Metropolis world specifically. First, the samples are correlated: each step of the walk is a small nudge from the last, so successive configurations are not independent, and the naive 1/\sqrt{N} is too optimistic unless you account for the autocorrelation time between genuinely independent samples. Second, burn-in: the walk starts in whatever state you chose, which may be wildly untypical, so you must discard the first stretch of the walk before it has "forgotten" its start and settled into equilibrium. Measure too early and you bias every average. And underneath it all: a bad random-number generator — one with hidden patterns or a short period — can quietly poison the results, which is why physicists fuss so much over their PRNGs.