Gaussian Elimination and LU Decomposition
You already know how to solve a linear system by hand. Given
A\mathbf{x}=\mathbf{b} you run
Gaussian
elimination: sweep downward, using each pivot row to knock the entries below it to zero,
until the matrix is upper-triangular; then back-substitute from the bottom row up. It
works, it always has, and for a single 3\times3 system it is a pleasant
afternoon's arithmetic.
This page is about a quieter fact hiding inside that familiar procedure — a fact that turns a
one-shot hand method into the industrial workhorse of scientific computing. The multipliers you
throw away during elimination are not garbage. Collected up, they are a matrix in their own right — a
lower-triangular matrix L — and together with the upper-triangular result
U they reconstruct the original:
A = LU.
That single observation is the LU decomposition. It says elimination is really a
factorisation: you spend one expensive pass to split A into two
triangular pieces, and from then on every system with that same A — however
many right-hand sides \mathbf{b} you are handed — falls out almost for free.
We will see why that is a colossal win, then confront the one thing that can wreck it —
a tiny pivot — and the cheap fix, partial pivoting, that every
serious solver on Earth uses.
Elimination, re-read as bookkeeping
Take the system we will use throughout:
A = \begin{pmatrix} 2 & 1 & 1\\ 4 & 3 & 3\\ 8 & 7 & 9 \end{pmatrix}, \qquad \mathbf{b} = \begin{pmatrix} 4\\ 10\\ 30 \end{pmatrix}.
Forward elimination on the first column subtracts
m_{21}=\tfrac{4}{2}=2 times row 1 from row 2, and
m_{31}=\tfrac{8}{2}=4 times row 1 from row 3. Each multiplier is
the entry we want to kill divided by the pivot above it. The second column then needs one more
multiplier, m_{32}, to clear the last sub-diagonal entry. When the dust
settles the matrix is upper-triangular — call it U.
Here is the key of it. Write those three multipliers into the corresponding sub-diagonal slots of an
identity matrix, and you get a unit lower-triangular matrix (ones on the diagonal):
L = \begin{pmatrix} 1 & 0 & 0\\ m_{21} & 1 & 0\\ m_{31} & m_{32} & 1 \end{pmatrix}.
Multiply L by U and — every time — you recover
A exactly. The elimination did not destroy information; it
reorganised A into a product of two triangles. This is the
Doolittle form of LU (unit diagonal on L). Let the machine
do it and print all three matrices so you can check LU=A by eye:
// Doolittle LU factorisation (no pivoting yet): A = L·U.
// L is unit lower-triangular (1s on the diagonal), U is upper-triangular.
const A: number[][] = [
[2, 1, 1],
[4, 3, 3],
[8, 7, 9],
];
const n = 3;
// L starts as the identity; U starts as a copy of A and gets eliminated in place.
const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]];
const U: number[][] = A.map((row) => row.slice());
for (let k = 0; k < n; k++) {
for (let i = k + 1; i < n; i++) {
const m = U[i][k] / U[k][k]; // the multiplier — store it in L
L[i][k] = m;
for (let j = k; j < n; j++) {
U[i][j] -= m * U[k][j]; // subtract m·(row k) from row i
}
}
}
console.log("L =");
for (const row of L) console.log(" " + row.map((v) => v.toFixed(2)).join(" "));
console.log("U =");
for (const row of U) console.log(" " + row.map((v) => v.toFixed(2)).join(" "));
// Verify L·U reproduces A, entry by entry.
console.log("L·U (should equal A) =");
for (let i = 0; i < n; i++) {
const row: number[] = [];
for (let j = 0; j < n; j++) {
let s = 0;
for (let k = 0; k < n; k++) s += L[i][k] * U[k][j];
row.push(s);
}
console.log(" " + row.map((v) => v.toFixed(2)).join(" "));
}
The multipliers 2, 4 and m_{32} land in the
lower triangle of L; the eliminated rows land in U;
and their product is A to the last decimal. Notice too that
L and U together hold exactly
n^2 non-trivial numbers — the same count as A — so
in a real implementation both are stored on top of A, the
multipliers occupying the zeros they create. No extra memory at all.
Why bother? Factor once, solve forever
Suppose you have A=LU in hand. To solve
A\mathbf{x}=\mathbf{b}, substitute the factorisation:
A\mathbf{x} = L(U\mathbf{x}) = \mathbf{b}.
Name the inner product \mathbf{y}=U\mathbf{x}. Now the single hard system
splits into two easy triangular ones:
L\mathbf{y}=\mathbf{b} \quad(\text{forward substitution}), \qquad U\mathbf{x}=\mathbf{y} \quad(\text{back substitution}).
A triangular system is trivial to solve. L\mathbf{y}=\mathbf{b} gives
y_1 immediately from the first row, then y_2 from
the second, sweeping down; U\mathbf{x}=\mathbf{y} gives
x_n from the last row, then sweeps up. Each triangular solve costs
only about n^2 operations.
Here is the punchline that makes LU worth its name. The factorisation is the
expensive step — it costs about \tfrac{2}{3}n^3 floating-point operations,
the same as plain Gaussian elimination. But you only pay it once. After that, every
new right-hand side \mathbf{b} is just two n^2
triangular solves. Solve for a hundred different \mathbf{b}'s and you pay
the n^3 price a single time, not a hundred times.
-
Factorisation. Gaussian elimination factors a square matrix as
A=LU, where L is unit lower-triangular (its
sub-diagonal entries are the elimination multipliers m_{ik}=a_{ik}/a_{kk})
and U is the upper-triangular result of the elimination. With row
swaps for stability it becomes PA=LU, where
P is a permutation matrix recording the swaps.
-
Solve procedure. To solve A\mathbf{x}=\mathbf{b}:
forward-substitute L\mathbf{y}=P\mathbf{b}, then back-substitute
U\mathbf{x}=\mathbf{y}.
-
Cost. The factorisation is \sim\tfrac{2}{3}n^3
operations, paid once. Each subsequent solve — for a new
\mathbf{b} — is only \sim 2n^2. Reusing one
factorisation across many right-hand sides is the whole point.
Watch it in action. We factor A once, then solve two different
right-hand sides from that same factorisation — no re-elimination:
// Factor A = L·U once, then solve A x = b for several b's cheaply.
const A: number[][] = [
[2, 1, 1],
[4, 3, 3],
[8, 7, 9],
];
const n = 3;
// --- Step 1 (expensive, done ONCE): build L and U ---
const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]];
const U: number[][] = A.map((r) => r.slice());
for (let k = 0; k < n; k++) {
for (let i = k + 1; i < n; i++) {
const m = U[i][k] / U[k][k];
L[i][k] = m;
for (let j = k; j < n; j++) U[i][j] -= m * U[k][j];
}
}
// --- Step 2 (cheap, reused): forward + back substitution ---
function solve(b: number[]): number[] {
// L y = b (forward substitution, sweep down)
const y: number[] = new Array(n).fill(0);
for (let i = 0; i < n; i++) {
let s = b[i];
for (let j = 0; j < i; j++) s -= L[i][j] * y[j];
y[i] = s / L[i][i];
}
// U x = y (back substitution, sweep up)
const x: number[] = new Array(n).fill(0);
for (let i = n - 1; i >= 0; i--) {
let s = y[i];
for (let j = i + 1; j < n; j++) s -= U[i][j] * x[j];
x[i] = s / U[i][i];
}
return x;
}
// Two DIFFERENT right-hand sides, ONE factorisation.
for (const b of [[4, 10, 30], [1, 1, 1]]) {
const x = solve(b);
console.log(`b = [${b.join(", ")}] -> x = [${x.map((v) => v.toFixed(3)).join(", ")}]`);
}
Both solutions come out of a single \tfrac{2}{3}n^3 factorisation. This is
exactly why LU, not raw elimination, is what libraries like LAPACK actually hand you: the factorisation
is the reusable asset. Change the loads on a bridge but keep its geometry, and you re-solve with the
same L and U in a heartbeat.
Because naming the factorisation changes what you can do with it. Once you see
A=LU as an object, a cascade of results falls out for free.
The determinant is the product of U's diagonal (times
\pm1 for the row swaps), because
\det A = \det L \cdot \det U = 1\cdot\prod_i u_{ii}. The
inverse is just n solves — one per column of the
identity — sharing the one factorisation. And factoring ahead of time lets a real-time
system respond to a stream of right-hand sides at n^2 speed. The
elimination was always doing this; giving it a name let us reuse it.
The determinant and the inverse, almost for free
Two classic quantities that are painful to compute directly become one-liners once you hold
A=LU.
Determinant. A triangular matrix's determinant is simply the product of its diagonal.
Since L has all-ones diagonal, \det L = 1, and so
\det A = \det L \cdot \det U = \prod_{i=1}^{n} u_{ii}.
(With partial pivoting you multiply by (-1)^s, where
s is the number of row swaps — each swap flips the sign of the
determinant.) Computing a determinant this way costs \sim\tfrac{2}{3}n^3;
the schoolbook cofactor expansion costs \sim n!, which for
n=20 is the difference between a microsecond and longer than the age of the
universe.
Inverse. The j-th column of
A^{-1} solves A\mathbf{x}=\mathbf{e}_j, where
\mathbf{e}_j is the j-th column of the identity.
So the inverse is n right-hand-side solves — all sharing the one
factorisation. (In practice you almost never want
A^{-1} explicitly: to solve A\mathbf{x}=\mathbf{b}
you solve with LU directly rather than forming the inverse and multiplying — it is both faster and more
accurate.) Here both by-products come straight off the factorisation:
// Determinant and inverse as by-products of one LU factorisation.
const A: number[][] = [
[2, 1, 1],
[4, 3, 3],
[8, 7, 9],
];
const n = 3;
const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]];
const U: number[][] = A.map((r) => r.slice());
for (let k = 0; k < n; k++) {
for (let i = k + 1; i < n; i++) {
const m = U[i][k] / U[k][k];
L[i][k] = m;
for (let j = k; j < n; j++) U[i][j] -= m * U[k][j];
}
}
// det A = product of U's diagonal (no swaps here, so no sign flip).
let det = 1;
for (let i = 0; i < n; i++) det *= U[i][i];
console.log(`det A = ${det.toFixed(3)}`); // = -2
function solve(b: number[]): number[] {
const y: number[] = new Array(n).fill(0);
for (let i = 0; i < n; i++) {
let s = b[i];
for (let j = 0; j < i; j++) s -= L[i][j] * y[j];
y[i] = s; // L has unit diagonal
}
const x: number[] = new Array(n).fill(0);
for (let i = n - 1; i >= 0; i--) {
let s = y[i];
for (let j = i + 1; j < n; j++) s -= U[i][j] * x[j];
x[i] = s / U[i][i];
}
return x;
}
// Inverse: solve A x = e_j for each column of the identity.
const inv: number[][] = [[0, 0, 0], [0, 0, 0], [0, 0, 0]];
for (let j = 0; j < n; j++) {
const e = [0, 0, 0];
e[j] = 1;
const col = solve(e);
for (let i = 0; i < n; i++) inv[i][j] = col[i];
}
console.log("A inverse =");
for (const row of inv) console.log(" " + row.map((v) => v.toFixed(3)).join(" "));
The determinant comes out as -2 — the product
2\cdot(\text{middle pivot})\cdot(\text{last pivot}) — and each column of the
inverse is one cheap reuse of the factors.
The catch: a tiny pivot is a numerical disaster
Everything above assumed the pivots u_{kk} we divide by are healthy. But
elimination divides by the pivot at every step — the multiplier is
m_{ik}=a_{ik}/a_{kk} — and dividing by a number close to zero
manufactures enormous multipliers. Those giant multipliers then scale up the rounding errors
already living in the other entries (recall from
floating-point and
error that every stored number carries a tiny error, and subtracting near-equal numbers
causes catastrophic cancellation). A microscopic pivot turns those specks of dust into boulders.
The alarming part: this happens even when A is a perfectly nice, invertible
matrix. Consider
\begin{pmatrix} \varepsilon & 1\\ 1 & 1 \end{pmatrix}\begin{pmatrix} x_1\\ x_2 \end{pmatrix} = \begin{pmatrix} 1\\ 2 \end{pmatrix}, \qquad \varepsilon = 10^{-20}.
Its true solution is almost exactly x_1=x_2=1. But if you naively use the
top-left \varepsilon as a pivot, the multiplier is
1/\varepsilon = 10^{20}. Row 2 becomes
1 - 10^{20}, which in floating point rounds to just
-10^{20} — the crucial +1 is annihilated.
The answer comes back x_1=0, a 100% error. Nothing was wrong with the
matrix; the algorithm was unstable. Run it:
// A perfectly invertible 2x2 that naive elimination botches.
const eps = 1e-20;
// [ eps 1 ] [x1] [1] true answer: x1 ≈ 1, x2 ≈ 1
// [ 1 1 ] [x2] = [2]
// --- NAIVE: use the tiny top-left entry as the pivot ---
{
const a11 = eps, a12 = 1, b1 = 1;
const a21 = 1, a22 = 1, b2 = 2;
const m = a21 / a11; // 1 / 1e-20 = 1e20 (giant!)
const a22p = a22 - m * a12; // 1 - 1e20 -> rounds to -1e20
const b2p = b2 - m * b1; // 2 - 1e20 -> rounds to -1e20
const x2 = b2p / a22p; // ≈ 1 (survives)
const x1 = (b1 - a12 * x2) / a11; // (1 - x2)/eps -> catastrophe
console.log(`NAIVE : x1 = ${x1}, x2 = ${x2}`);
}
// --- PIVOTED: swap so the LARGER entry (1) is the pivot ---
{
const a11 = 1, a12 = 1, b1 = 2; // rows swapped
const a21 = eps, a22 = 1, b2 = 1;
const m = a21 / a11; // 1e-20 (tiny, harmless)
const a22p = a22 - m * a12; // ≈ 1
const b2p = b2 - m * b1; // ≈ 1
const x2 = b2p / a22p; // ≈ 1
const x1 = (b1 - a12 * x2) / a11; // ≈ 1
console.log(`PIVOTED : x1 = ${x1}, x2 = ${x2}`);
}
The naive version returns x_1=0 — dead wrong. The pivoted version returns
x_1=x_2=1 — correct to full precision. Same matrix, same arithmetic; the
only difference is which row we put on top.
It is tempting to think numerical trouble means the matrix is "bad" — nearly singular,
ill-conditioned, degenerate. Not here. The matrix above has determinant
\varepsilon - 1 \approx -1: it is about as well-behaved and invertible as
a matrix can be. The instability is entirely the fault of the algorithm choosing a
minuscule pivot, not the problem.
The mechanism is the one from floating point. A tiny pivot a_{kk} creates
a huge multiplier m = a_{ik}/a_{kk}. When you then form
a_{ij} - m\,a_{kj}, the giant term m\,a_{kj}
swamps the modest a_{ij}, so a_{ij}'s digits
fall off the end of the mantissa and are lost forever — and later, back-substitution divides by that
tiny pivot, amplifying the now-corrupted numbers. The rule is blunt and universal: never
divide by a small pivot if a bigger one is available in the column. That is precisely what
partial pivoting guarantees.
Partial pivoting: put the biggest fish on top
The fix is embarrassingly simple. At each elimination step k, before
dividing, look down the pivot column for the entry of largest magnitude, and swap that
row up into the pivot position. Now every multiplier satisfies
|m_{ik}| = |a_{ik}|/|a_{kk}| \le 1 — the multipliers can never blow up,
because the pivot is by construction the biggest number available. Rounding errors are held on a
leash instead of being amplified.
Bookkeeping the swaps gives the factorisation its final, honest form. The row interchanges are
recorded in a permutation matrix P (an identity with its
rows shuffled), and what you actually factor is the row-reordered matrix:
PA = LU.
To solve A\mathbf{x}=\mathbf{b} you simply permute the right-hand side the
same way and proceed: forward-solve L\mathbf{y}=P\mathbf{b}, then
back-solve U\mathbf{x}=\mathbf{y}. This — Gaussian elimination with
partial pivoting, "GEPP" — is the default algorithm behind virtually every
solve / \ / lu command in every numerical library on the
planet. It is stable in practice for essentially every matrix you will ever meet. Here is the full
factor-with-pivoting routine, printing the permutation and verifying
PA=LU:
// Gaussian elimination with PARTIAL PIVOTING: P A = L U.
const A: number[][] = [
[2, 1, 1],
[4, 3, 3],
[8, 7, 9],
];
const n = 3;
const U: number[][] = A.map((r) => r.slice());
const L: number[][] = [[1, 0, 0], [0, 1, 0], [0, 0, 1]];
const piv: number[] = [0, 1, 2]; // records the row order after swaps
for (let k = 0; k < n; k++) {
// 1) find the row (at or below k) with the LARGEST |entry| in column k
let best = k;
for (let i = k + 1; i < n; i++) {
if (Math.abs(U[i][k]) > Math.abs(U[best][k])) best = i;
}
// 2) swap that row up — in U, in the already-built part of L, and in piv
if (best !== k) {
[U[k], U[best]] = [U[best], U[k]];
[piv[k], piv[best]] = [piv[best], piv[k]];
for (let j = 0; j < k; j++) [L[k][j], L[best][j]] = [L[best][j], L[k][j]];
}
// 3) eliminate below the (now large) pivot
for (let i = k + 1; i < n; i++) {
const m = U[i][k] / U[k][k]; // |m| <= 1, guaranteed
L[i][k] = m;
for (let j = k; j < n; j++) U[i][j] -= m * U[k][j];
}
}
console.log(`permutation (new row order): [${piv.join(", ")}]`);
console.log("L =");
for (const row of L) console.log(" " + row.map((v) => v.toFixed(3)).join(" "));
console.log("U =");
for (const row of U) console.log(" " + row.map((v) => v.toFixed(3)).join(" "));
// Verify: (L·U) row i should equal A's row piv[i].
console.log("check L·U vs P·A:");
let ok = true;
for (let i = 0; i < n; i++) {
for (let j = 0; j < n; j++) {
let s = 0;
for (let k = 0; k < n; k++) s += L[i][k] * U[k][j];
if (Math.abs(s - A[piv[i]][j]) > 1e-9) ok = false;
}
}
console.log(ok ? " P A = L U verified" : " MISMATCH");
The largest first-column entry is 8, so row 3 is pulled up to be the first
pivot; the permutation records the reshuffle, all multipliers come out with magnitude
\le 1, and PA=LU checks out. That bounded-multiplier
property is the entire reason GEPP is trustworthy.
The price tag, drawn to scale
Why do we make so much fuss about reusing the factorisation? Because the two costs live on wildly
different curves. Factorising is \sim\tfrac{2}{3}n^3; each extra solve is
\sim 2n^2. As n grows the cubic term utterly
dominates — which is exactly why you never want to pay it twice. The chart plots both operation counts
against the matrix size n:
For n=200 the factorisation is roughly
5.3 million operations while one solve is about
80{,}000 — a 66\times ratio, and it widens
linearly with n. Solving ten thousand right-hand sides with one
factorisation costs barely more than solving the first. That is the economics that made LU the bedrock
of numerical linear algebra.
Yes — if your matrix has structure, exploit it. A symmetric positive-definite
matrix admits the Cholesky factorisation
A=LL^{\mathsf T}, which needs no pivoting and costs
half as much, about \tfrac{1}{3}n^3. A banded
or sparse matrix — the kind that comes from discretising a physical field, mostly
zeros — can be factored in nearly O(n) work by never touching the zeros.
The general dense LU with partial pivoting is the reliable default; the specialised factorisations
are the sports cars you reach for when you know your matrix has the right shape.