The HHL Algorithm

Imagine a linear system so vast — a system of equations A\vec{x} = \vec{b} with a trillion unknowns — that no classical computer could ever write down the answer vector \vec{x}, let alone read out its trillion entries. Now suppose you don't actually want all trillion numbers. You want one number: the total load on a single power line, the average of a stress field, one weighted sum over the solution. The Harrow–Hassidim–Lloyd (HHL) algorithm is built for exactly this. It produces the solution as a quantum state |x\rangle in time that grows like \log N instead of N — an exponential saving in the dimension. But that "as a quantum state" is doing an enormous amount of work, and the whole lesson is really about the fine print.

The plan: invert A one eigenvalue at a time

Assume A is Hermitian (if it isn't, a standard trick embeds it in a bigger Hermitian matrix). Then A has real eigenvalues \lambda_j with orthonormal eigenvectors |u_j\rangle. Encode the right-hand side \vec{b} as a quantum state |b\rangle (its amplitudes are the entries of \vec{b}, normalised) and expand it in that eigenbasis:

|b\rangle = \sum_j \beta_j\,|u_j\rangle.

The exact solution is \vec{x} = A^{-1}\vec{b}. Because A^{-1}|u_j\rangle = \tfrac{1}{\lambda_j}|u_j\rangle, inverting A just means dividing each eigen-component by its eigenvalue:

|x\rangle \;\propto\; A^{-1}|b\rangle \;=\; \sum_j \frac{\beta_j}{\lambda_j}\,|u_j\rangle.

So the entire job of HHL is to turn each coefficient \beta_j into \beta_j/\lambda_j. The difficulty is that a quantum computer doesn't see the eigenvalues — it has to measure them, in superposition, without collapsing the state. That is precisely what phase estimation is for.

Three moves: estimate, rotate, uncompute

HHL threads the eigen-decomposition through three quantum sub-steps:

Phase estimation. Run quantum phase estimation on the unitary e^{iAt}. Since e^{iAt}|u_j\rangle = e^{i\lambda_j t}|u_j\rangle, the eigenvector |u_j\rangle carries the eigenvalue \lambda_j in its phase, and QPE writes an estimate of \lambda_j into an auxiliary register: \sum_j \beta_j\,|u_j\rangle|\lambda_j\rangle.
Controlled rotation. Conditioned on the value \lambda_j now sitting in that register, rotate an extra ancilla qubit so its amplitude becomes proportional to 1/\lambda_j. Measuring the ancilla and post-selecting on the "success" outcome multiplies each component by exactly C/\lambda_j — the 1/\lambda we wanted.
Inverse phase estimation. Run QPE backwards to erase the |\lambda_j\rangle register, disentangling it so it doesn't spoil the answer. What's left on the main register is |x\rangle \propto \sum_j (\beta_j/\lambda_j)|u_j\rangle = A^{-1}|b\rangle.

The output is the solution — but living inside the amplitudes of a quantum state, not printed on a tape.

The pipeline, end to end

Step through the whole assembly line. Each block is a reversible quantum operation; only the very last stage — pulling a number back out — is a measurement.

Worked example 1 — the eigen-decomposition step

Let's make the abstract sum concrete on paper first. Suppose A has just two eigenpairs, (\lambda_1, |u_1\rangle) and (\lambda_2, |u_2\rangle), and the right-hand side decomposes as

|b\rangle = \beta_1\,|u_1\rangle + \beta_2\,|u_2\rangle.

HHL leaves the eigenvectors untouched — they are the natural axes of A — and only reweights the coefficients. Phase estimation exposes \lambda_1 and \lambda_2; the controlled rotation divides by them; the inverse QPE tidies up. The state that emerges is

|x\rangle \;\propto\; \frac{\beta_1}{\lambda_1}\,|u_1\rangle + \frac{\beta_2}{\lambda_2}\,|u_2\rangle.

Read that off: a component along an eigenvector with a small eigenvalue gets amplified (dividing by a small number), while a component with a large eigenvalue gets suppressed. That is exactly what A^{-1} does — and it is also the seed of the trouble with tiny eigenvalues we'll meet in the caveats.

Worked example 2 — a 2×2 diagonal A, with numbers

Take the cleanest possible case, a diagonal matrix whose eigenvectors are just the computational basis states |0\rangle and |1\rangle:

A = \begin{bmatrix} 2 & 0 \\ 0 & 4 \end{bmatrix}, \qquad \vec{b} = \begin{bmatrix} 1 \\ 1 \end{bmatrix} \;\Rightarrow\; |b\rangle = \tfrac{1}{\sqrt2}\big(|0\rangle + |1\rangle\big).

The eigenvalues are \lambda_0 = 2 (eigenvector |0\rangle) and \lambda_1 = 4 (eigenvector |1\rangle), read straight off the diagonal. The 1/\lambda rotation multiplies the |0\rangle amplitude by \tfrac12 and the |1\rangle amplitude by \tfrac14:

A^{-1}|b\rangle = \tfrac{1}{\sqrt2}\Big(\tfrac{1}{2}|0\rangle + \tfrac{1}{4}|1\rangle\Big).

Dropping the overall constant and renormalising, the output state is

|x\rangle = \frac{\tfrac12|0\rangle + \tfrac14|1\rangle}{\sqrt{(\tfrac12)^2 + (\tfrac14)^2}} = \frac{2|0\rangle + |1\rangle}{\sqrt5}.

Check it classically: \vec{x} = A^{-1}\vec{b} = (\tfrac12, \tfrac14), whose direction is indeed (2, 1) after clearing fractions — the same state HHL prepares. Notice the honest limitation already: HHL hands you the normalised direction (2,1)/\sqrt5, not the raw entries (\tfrac12, \tfrac14). The overall scale lives in that discarded normalisation constant.

Problem. Given a sparse Hermitian A and a state |b\rangle, produce the solution of A\vec{x} = \vec{b} as a quantum state |x\rangle \propto A^{-1}|b\rangle.
Method. Write |b\rangle = \sum_j \beta_j|u_j\rangle; use phase estimation on e^{iAt} to expose each \lambda_j; apply a controlled rotation that scales each component by 1/\lambda_j; then uncompute the eigenvalue register with inverse phase estimation.
Runtime. Roughly O\!\big(\log(N)\,s^2\,\kappa^2/\varepsilon\big) — exponentially better in the dimension N than classical O(N), where s is the sparsity, \kappa the condition number, and \varepsilon the accuracy.
Catch. The output is a state, extractable only as summary statistics \langle x|M|x\rangle; the speedup is conditional on cheap |b\rangle preparation, sparse and well-conditioned A, and never needing the individual entries of \vec{x}.

Where the exponential speedup lives — and where it hides

Solving a dense N-variable system classically costs on the order of N^3; even the best sparse iterative solvers still touch all N entries, costing O(N s \kappa). HHL's \log N dependence looks like an exponential leap. But the runtime O(\log(N)\,s^2\,\kappa^2/\varepsilon) also carries a \kappa^2 and a 1/\varepsilon that are easy to gloss over. If the condition number \kappa grows with the problem size, or you demand high precision, those factors can quietly swallow the \log N win. The exponential advantage is real, but it is an advantage in one variable (N) bought under a stack of assumptions about all the others.

This is the single most over-hyped point in all of quantum algorithms, so state it plainly. HHL does not give you \vec{x}. It gives you |x\rangle, a quantum state whose amplitudes encode \vec{x}. To learn one amplitude you'd have to measure, and a single measurement returns one random basis outcome, not a number — reconstructing all N entries would take \sim N repetitions and throw the entire exponential speedup away. What you can cheaply extract is a summary statistic \langle x|M|x\rangle: an expectation, a norm, an overlap with some fixed vector. HHL is only a win when the question is one of those numbers.

And it only works at all when three further conditions hold: (i) you can prepare |b\rangle efficiently — if loading \vec{b} already costs O(N), you've lost before you start; (ii) A is sparse and you can simulate e^{iAt}; and (iii) A is well-conditioned — a small \kappa = \lambda_{\max}/\lambda_{\min}. A tiny eigenvalue means dividing by a tiny number, which both blows up the runtime (the \kappa^2) and makes the controlled rotation succeed only rarely. Miss any of these and the algorithm degrades from magic to useless.

In 2018, then-undergraduate Ewin Tang did something the field half-expected to be impossible: she took a celebrated quantum recommendation-system algorithm — a close cousin of HHL — and produced a classical algorithm only polynomially slower, demolishing its claimed exponential advantage. The trick, now called "dequantization," is that quantum algorithms like HHL assume you can cheaply sample from and query the input; if a classical algorithm is granted the same sampling access (so-called ℓ²-norm sampling), it can often mimic the quantum output using randomised linear algebra.

A wave of dequantization results followed, matching several HHL-type speedups classically under the same assumptions. The lesson isn't that HHL is worthless — for genuinely sparse, well-conditioned, high-rank systems the quantum advantage may survive — but that a fair comparison must give the classical algorithm the same input access the quantum one enjoys. Many advertised "exponential speedups" shrink to polynomial once you do.

Given all these caveats, why is HHL one of the most cited quantum algorithms ever? Because it is rarely used alone. It is a subroutine: a fast way to apply A^{-1} to a state in the middle of a larger quantum computation whose final answer is naturally a single number. That description fits a great deal of quantum machine learning — least-squares fitting, support vector machines, Gaussian processes, recommendation systems — where you fit a model to data (a linear solve) and then only ever query it for a prediction (an inner product), never dumping out every parameter. Whenever the pipeline is "solve a huge linear system, then ask it one question," HHL is the piece that makes the solve cheap. That is the shape of problem it was built for — and the shape where its exponential promise has the best chance of surviving contact with reality.