Gaussian Elimination and LU Decomposition

You already know how to solve a linear system by hand. Given A\mathbf{x}=\mathbf{b} you run Gaussian elimination: sweep downward, using each pivot row to knock the entries below it to zero, until the matrix is upper-triangular; then back-substitute from the bottom row up. It works, it always has, and for a single 3\times3 system it is a pleasant afternoon's arithmetic.

This page is about a quieter fact hiding inside that familiar procedure — a fact that turns a one-shot hand method into the industrial workhorse of scientific computing. The multipliers you throw away during elimination are not garbage. Collected up, they are a matrix in their own right — a lower-triangular matrix L — and together with the upper-triangular result U they reconstruct the original:

A = LU.

That single observation is the LU decomposition. It says elimination is really a factorisation: you spend one expensive pass to split A into two triangular pieces, and from then on every system with that same A — however many right-hand sides \mathbf{b} you are handed — falls out almost for free. We will see why that is a colossal win, then confront the one thing that can wreck it — a tiny pivot — and the cheap fix, partial pivoting, that every serious solver on Earth uses.

Elimination, re-read as bookkeeping

Take the system we will use throughout:

A = \begin{pmatrix} 2 & 1 & 1\\ 4 & 3 & 3\\ 8 & 7 & 9 \end{pmatrix}, \qquad \mathbf{b} = \begin{pmatrix} 4\\ 10\\ 30 \end{pmatrix}.

Forward elimination on the first column subtracts m_{21}=\tfrac{4}{2}=2 times row 1 from row 2, and m_{31}=\tfrac{8}{2}=4 times row 1 from row 3. Each multiplier is the entry we want to kill divided by the pivot above it. The second column then needs one more multiplier, m_{32}, to clear the last sub-diagonal entry. When the dust settles the matrix is upper-triangular — call it U.

Here is the key of it. Write those three multipliers into the corresponding sub-diagonal slots of an identity matrix, and you get a unit lower-triangular matrix (ones on the diagonal):

L = \begin{pmatrix} 1 & 0 & 0\\ m_{21} & 1 & 0\\ m_{31} & m_{32} & 1 \end{pmatrix}.

Multiply L by U and — every time — you recover A exactly. The elimination did not destroy information; it reorganised A into a product of two triangles. This is the Doolittle form of LU (unit diagonal on L). Let the machine do it and print all three matrices so you can check LU=A by eye:

// Doolittle LU factorisation (no pivoting yet): A = L·U. // L is unit lower-triangular (1s on the diagonal), U is upper-triangular. const A: number[][] = [ [2, 1, 1], [4, 3, 3], [8, 7, 9], ]; const n = 3; // L starts as the identity; U starts as a copy of A and gets eliminated in place. const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]; const U: number[][] = A.map((row) => row.slice()); for (let k = 0; k < n; k++) { for (let i = k + 1; i < n; i++) { const m = U[i][k] / U[k][k]; // the multiplier — store it in L L[i][k] = m; for (let j = k; j < n; j++) { U[i][j] -= m * U[k][j]; // subtract m·(row k) from row i } } } console.log("L ="); for (const row of L) console.log(" " + row.map((v) => v.toFixed(2)).join(" ")); console.log("U ="); for (const row of U) console.log(" " + row.map((v) => v.toFixed(2)).join(" ")); // Verify L·U reproduces A, entry by entry. console.log("L·U (should equal A) ="); for (let i = 0; i < n; i++) { const row: number[] = []; for (let j = 0; j < n; j++) { let s = 0; for (let k = 0; k < n; k++) s += L[i][k] * U[k][j]; row.push(s); } console.log(" " + row.map((v) => v.toFixed(2)).join(" ")); }

The multipliers 2, 4 and m_{32} land in the lower triangle of L; the eliminated rows land in U; and their product is A to the last decimal. Notice too that L and U together hold exactly n^2 non-trivial numbers — the same count as A — so in a real implementation both are stored on top of A, the multipliers occupying the zeros they create. No extra memory at all.

Why bother? Factor once, solve forever

Suppose you have A=LU in hand. To solve A\mathbf{x}=\mathbf{b}, substitute the factorisation:

A\mathbf{x} = L(U\mathbf{x}) = \mathbf{b}.

Name the inner product \mathbf{y}=U\mathbf{x}. Now the single hard system splits into two easy triangular ones:

L\mathbf{y}=\mathbf{b} \quad(\text{forward substitution}), \qquad U\mathbf{x}=\mathbf{y} \quad(\text{back substitution}).

A triangular system is trivial to solve. L\mathbf{y}=\mathbf{b} gives y_1 immediately from the first row, then y_2 from the second, sweeping down; U\mathbf{x}=\mathbf{y} gives x_n from the last row, then sweeps up. Each triangular solve costs only about n^2 operations.

Here is the punchline that makes LU worth its name. The factorisation is the expensive step — it costs about \tfrac{2}{3}n^3 floating-point operations, the same as plain Gaussian elimination. But you only pay it once. After that, every new right-hand side \mathbf{b} is just two n^2 triangular solves. Solve for a hundred different \mathbf{b}'s and you pay the n^3 price a single time, not a hundred times.

Watch it in action. We factor A once, then solve two different right-hand sides from that same factorisation — no re-elimination:

// Factor A = L·U once, then solve A x = b for several b's cheaply. const A: number[][] = [ [2, 1, 1], [4, 3, 3], [8, 7, 9], ]; const n = 3; // --- Step 1 (expensive, done ONCE): build L and U --- const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]; const U: number[][] = A.map((r) => r.slice()); for (let k = 0; k < n; k++) { for (let i = k + 1; i < n; i++) { const m = U[i][k] / U[k][k]; L[i][k] = m; for (let j = k; j < n; j++) U[i][j] -= m * U[k][j]; } } // --- Step 2 (cheap, reused): forward + back substitution --- function solve(b: number[]): number[] { // L y = b (forward substitution, sweep down) const y: number[] = new Array(n).fill(0); for (let i = 0; i < n; i++) { let s = b[i]; for (let j = 0; j < i; j++) s -= L[i][j] * y[j]; y[i] = s / L[i][i]; } // U x = y (back substitution, sweep up) const x: number[] = new Array(n).fill(0); for (let i = n - 1; i >= 0; i--) { let s = y[i]; for (let j = i + 1; j < n; j++) s -= U[i][j] * x[j]; x[i] = s / U[i][i]; } return x; } // Two DIFFERENT right-hand sides, ONE factorisation. for (const b of [[4, 10, 30], [1, 1, 1]]) { const x = solve(b); console.log(`b = [${b.join(", ")}] -> x = [${x.map((v) => v.toFixed(3)).join(", ")}]`); }

Both solutions come out of a single \tfrac{2}{3}n^3 factorisation. This is exactly why LU, not raw elimination, is what libraries like LAPACK actually hand you: the factorisation is the reusable asset. Change the loads on a bridge but keep its geometry, and you re-solve with the same L and U in a heartbeat.

Because naming the factorisation changes what you can do with it. Once you see A=LU as an object, a cascade of results falls out for free. The determinant is the product of U's diagonal (times \pm1 for the row swaps), because \det A = \det L \cdot \det U = 1\cdot\prod_i u_{ii}. The inverse is just n solves — one per column of the identity — sharing the one factorisation. And factoring ahead of time lets a real-time system respond to a stream of right-hand sides at n^2 speed. The elimination was always doing this; giving it a name let us reuse it.

The determinant and the inverse, almost for free

Two classic quantities that are painful to compute directly become one-liners once you hold A=LU.

Determinant. A triangular matrix's determinant is simply the product of its diagonal. Since L has all-ones diagonal, \det L = 1, and so

\det A = \det L \cdot \det U = \prod_{i=1}^{n} u_{ii}.

(With partial pivoting you multiply by (-1)^s, where s is the number of row swaps — each swap flips the sign of the determinant.) Computing a determinant this way costs \sim\tfrac{2}{3}n^3; the schoolbook cofactor expansion costs \sim n!, which for n=20 is the difference between a microsecond and longer than the age of the universe.

Inverse. The j-th column of A^{-1} solves A\mathbf{x}=\mathbf{e}_j, where \mathbf{e}_j is the j-th column of the identity. So the inverse is n right-hand-side solves — all sharing the one factorisation. (In practice you almost never want A^{-1} explicitly: to solve A\mathbf{x}=\mathbf{b} you solve with LU directly rather than forming the inverse and multiplying — it is both faster and more accurate.) Here both by-products come straight off the factorisation:

// Determinant and inverse as by-products of one LU factorisation. const A: number[][] = [ [2, 1, 1], [4, 3, 3], [8, 7, 9], ]; const n = 3; const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]; const U: number[][] = A.map((r) => r.slice()); for (let k = 0; k < n; k++) { for (let i = k + 1; i < n; i++) { const m = U[i][k] / U[k][k]; L[i][k] = m; for (let j = k; j < n; j++) U[i][j] -= m * U[k][j]; } } // det A = product of U's diagonal (no swaps here, so no sign flip). let det = 1; for (let i = 0; i < n; i++) det *= U[i][i]; console.log(`det A = ${det.toFixed(3)}`); // = -2 function solve(b: number[]): number[] { const y: number[] = new Array(n).fill(0); for (let i = 0; i < n; i++) { let s = b[i]; for (let j = 0; j < i; j++) s -= L[i][j] * y[j]; y[i] = s; // L has unit diagonal } const x: number[] = new Array(n).fill(0); for (let i = n - 1; i >= 0; i--) { let s = y[i]; for (let j = i + 1; j < n; j++) s -= U[i][j] * x[j]; x[i] = s / U[i][i]; } return x; } // Inverse: solve A x = e_j for each column of the identity. const inv: number[][] = [[0, 0, 0], [0, 0, 0], [0, 0, 0]]; for (let j = 0; j < n; j++) { const e = [0, 0, 0]; e[j] = 1; const col = solve(e); for (let i = 0; i < n; i++) inv[i][j] = col[i]; } console.log("A inverse ="); for (const row of inv) console.log(" " + row.map((v) => v.toFixed(3)).join(" "));

The determinant comes out as -2 — the product 2\cdot(\text{middle pivot})\cdot(\text{last pivot}) — and each column of the inverse is one cheap reuse of the factors.

The catch: a tiny pivot is a numerical disaster

Everything above assumed the pivots u_{kk} we divide by are healthy. But elimination divides by the pivot at every step — the multiplier is m_{ik}=a_{ik}/a_{kk} — and dividing by a number close to zero manufactures enormous multipliers. Those giant multipliers then scale up the rounding errors already living in the other entries (recall from floating-point and error that every stored number carries a tiny error, and subtracting near-equal numbers causes catastrophic cancellation). A microscopic pivot turns those specks of dust into boulders.

The alarming part: this happens even when A is a perfectly nice, invertible matrix. Consider

\begin{pmatrix} \varepsilon & 1\\ 1 & 1 \end{pmatrix}\begin{pmatrix} x_1\\ x_2 \end{pmatrix} = \begin{pmatrix} 1\\ 2 \end{pmatrix}, \qquad \varepsilon = 10^{-20}.

Its true solution is almost exactly x_1=x_2=1. But if you naively use the top-left \varepsilon as a pivot, the multiplier is 1/\varepsilon = 10^{20}. Row 2 becomes 1 - 10^{20}, which in floating point rounds to just -10^{20} — the crucial +1 is annihilated. The answer comes back x_1=0, a 100% error. Nothing was wrong with the matrix; the algorithm was unstable. Run it:

// A perfectly invertible 2x2 that naive elimination botches. const eps = 1e-20; // [ eps 1 ] [x1] [1] true answer: x1 ≈ 1, x2 ≈ 1 // [ 1 1 ] [x2] = [2] // --- NAIVE: use the tiny top-left entry as the pivot --- { const a11 = eps, a12 = 1, b1 = 1; const a21 = 1, a22 = 1, b2 = 2; const m = a21 / a11; // 1 / 1e-20 = 1e20 (giant!) const a22p = a22 - m * a12; // 1 - 1e20 -> rounds to -1e20 const b2p = b2 - m * b1; // 2 - 1e20 -> rounds to -1e20 const x2 = b2p / a22p; // ≈ 1 (survives) const x1 = (b1 - a12 * x2) / a11; // (1 - x2)/eps -> catastrophe console.log(`NAIVE : x1 = ${x1}, x2 = ${x2}`); } // --- PIVOTED: swap so the LARGER entry (1) is the pivot --- { const a11 = 1, a12 = 1, b1 = 2; // rows swapped const a21 = eps, a22 = 1, b2 = 1; const m = a21 / a11; // 1e-20 (tiny, harmless) const a22p = a22 - m * a12; // ≈ 1 const b2p = b2 - m * b1; // ≈ 1 const x2 = b2p / a22p; // ≈ 1 const x1 = (b1 - a12 * x2) / a11; // ≈ 1 console.log(`PIVOTED : x1 = ${x1}, x2 = ${x2}`); }

The naive version returns x_1=0 — dead wrong. The pivoted version returns x_1=x_2=1 — correct to full precision. Same matrix, same arithmetic; the only difference is which row we put on top.

It is tempting to think numerical trouble means the matrix is "bad" — nearly singular, ill-conditioned, degenerate. Not here. The matrix above has determinant \varepsilon - 1 \approx -1: it is about as well-behaved and invertible as a matrix can be. The instability is entirely the fault of the algorithm choosing a minuscule pivot, not the problem.

The mechanism is the one from floating point. A tiny pivot a_{kk} creates a huge multiplier m = a_{ik}/a_{kk}. When you then form a_{ij} - m\,a_{kj}, the giant term m\,a_{kj} swamps the modest a_{ij}, so a_{ij}'s digits fall off the end of the mantissa and are lost forever — and later, back-substitution divides by that tiny pivot, amplifying the now-corrupted numbers. The rule is blunt and universal: never divide by a small pivot if a bigger one is available in the column. That is precisely what partial pivoting guarantees.

Partial pivoting: put the biggest fish on top

The fix is embarrassingly simple. At each elimination step k, before dividing, look down the pivot column for the entry of largest magnitude, and swap that row up into the pivot position. Now every multiplier satisfies |m_{ik}| = |a_{ik}|/|a_{kk}| \le 1 — the multipliers can never blow up, because the pivot is by construction the biggest number available. Rounding errors are held on a leash instead of being amplified.

Bookkeeping the swaps gives the factorisation its final, honest form. The row interchanges are recorded in a permutation matrix P (an identity with its rows shuffled), and what you actually factor is the row-reordered matrix:

PA = LU.

To solve A\mathbf{x}=\mathbf{b} you simply permute the right-hand side the same way and proceed: forward-solve L\mathbf{y}=P\mathbf{b}, then back-solve U\mathbf{x}=\mathbf{y}. This — Gaussian elimination with partial pivoting, "GEPP" — is the default algorithm behind virtually every solve / \ / lu command in every numerical library on the planet. It is stable in practice for essentially every matrix you will ever meet. Here is the full factor-with-pivoting routine, printing the permutation and verifying PA=LU:

// Gaussian elimination with PARTIAL PIVOTING: P A = L U. const A: number[][] = [ [2, 1, 1], [4, 3, 3], [8, 7, 9], ]; const n = 3; const U: number[][] = A.map((r) => r.slice()); const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]]; const piv: number[] = [0, 1, 2]; // records the row order after swaps for (let k = 0; k < n; k++) { // 1) find the row (at or below k) with the LARGEST |entry| in column k let best = k; for (let i = k + 1; i < n; i++) { if (Math.abs(U[i][k]) > Math.abs(U[best][k])) best = i; } // 2) swap that row up — in U, in the already-built part of L, and in piv if (best !== k) { [U[k], U[best]] = [U[best], U[k]]; [piv[k], piv[best]] = [piv[best], piv[k]]; for (let j = 0; j < k; j++) [L[k][j], L[best][j]] = [L[best][j], L[k][j]]; } // 3) eliminate below the (now large) pivot for (let i = k + 1; i < n; i++) { const m = U[i][k] / U[k][k]; // |m| <= 1, guaranteed L[i][k] = m; for (let j = k; j < n; j++) U[i][j] -= m * U[k][j]; } } console.log(`permutation (new row order): [${piv.join(", ")}]`); console.log("L ="); for (const row of L) console.log(" " + row.map((v) => v.toFixed(3)).join(" ")); console.log("U ="); for (const row of U) console.log(" " + row.map((v) => v.toFixed(3)).join(" ")); // Verify: (L·U) row i should equal A's row piv[i]. console.log("check L·U vs P·A:"); let ok = true; for (let i = 0; i < n; i++) { for (let j = 0; j < n; j++) { let s = 0; for (let k = 0; k < n; k++) s += L[i][k] * U[k][j]; if (Math.abs(s - A[piv[i]][j]) > 1e-9) ok = false; } } console.log(ok ? " P A = L U verified" : " MISMATCH");

The largest first-column entry is 8, so row 3 is pulled up to be the first pivot; the permutation records the reshuffle, all multipliers come out with magnitude \le 1, and PA=LU checks out. That bounded-multiplier property is the entire reason GEPP is trustworthy.

The price tag, drawn to scale

Why do we make so much fuss about reusing the factorisation? Because the two costs live on wildly different curves. Factorising is \sim\tfrac{2}{3}n^3; each extra solve is \sim 2n^2. As n grows the cubic term utterly dominates — which is exactly why you never want to pay it twice. The chart plots both operation counts against the matrix size n:

For n=200 the factorisation is roughly 5.3 million operations while one solve is about 80{,}000 — a 66\times ratio, and it widens linearly with n. Solving ten thousand right-hand sides with one factorisation costs barely more than solving the first. That is the economics that made LU the bedrock of numerical linear algebra.

Yes — if your matrix has structure, exploit it. A symmetric positive-definite matrix admits the Cholesky factorisation A=LL^{\mathsf T}, which needs no pivoting and costs half as much, about \tfrac{1}{3}n^3. A banded or sparse matrix — the kind that comes from discretising a physical field, mostly zeros — can be factored in nearly O(n) work by never touching the zeros. The general dense LU with partial pivoting is the reliable default; the specialised factorisations are the sports cars you reach for when you know your matrix has the right shape.