Quantum Machine Learning
Every few years a headline promises that quantum computers will supercharge
machine learning —
training neural nets in an instant, crunching "big data" exponentially faster. The reality is more
interesting and far more subtle. Quantum machine learning (QML) is a real, active
field with genuinely clever ideas, but almost every headline speedup comes wrapped in
conditions — fine print about how data gets into the machine and how answers come
out. This lesson lays out the three main directions QML actually takes, and then, just as
importantly, the three ways those promised speedups can quietly evaporate. The goal is a clear-eyed
view: where a quantum edge might be real, and where it is mostly hope.
Direction 1 — machine learning is mostly linear algebra
Strip away the branding and a surprising amount of machine learning is linear algebra.
Fitting a least-squares regression, running principal component analysis (PCA) to find the directions
of greatest variance, solving the normal equations of a model — each reduces to
solving a linear system, diagonalising a matrix, or multiplying by a matrix
inverse. Quantum computing has fast primitives for exactly these operations. The
HHL algorithm
solves A\vec{x}=\vec{b} in time \sim\log N instead
of N, and cousins of it perform quantum PCA and quantum least squares.
In principle, this is an exponential speedup in the dimension of the data. Fit a model
over a trillion features, ask it one prediction, and HHL-style algorithms promise to make the linear
solve at the heart of it exponentially cheap. That "in principle" is the whole story, though — as we'll
see, the assumptions HHL needs (cheap data loading, a well-conditioned matrix, a summary-only readout)
are exactly the things that make classical ML hard in practice too.
Direction 2 — variational quantum neural networks
The second direction fits today's noisy hardware far better. A
variational
quantum algorithm uses a parameterised circuit
U(\vec{\theta}) — a stack of gates with tunable rotation angles
\vec{\theta} — as the model itself. You feed in data, measure an output,
compare it to the desired label, and a classical optimiser nudges the angles
\vec{\theta} to reduce the loss. This is the quantum analogue of training a
neural network by gradient descent, and the circuit is often called a
quantum neural network (QNN).
The appeal is that these circuits are shallow and trainable on near-term devices — no
error correction, no phase estimation, no hypothetical hardware required. The open question is whether a
QNN can represent useful functions a classical net can't, and whether training stays feasible as the
circuit grows (it can hit "barren plateaus" where gradients vanish exponentially). It's promising, but
it is a research programme, not a settled advantage.
Direction 3 — quantum kernels and feature maps
The third direction is the most elegant. A classic trick in machine learning is the
kernel method: to separate data that isn't linearly separable, map each point
x into a much higher-dimensional feature space where a straight
boundary does separate the classes. A support vector machine never needs the feature vectors
themselves — only their inner products, the kernel
k(x,x') = \langle \phi(x), \phi(x') \rangle.
The quantum idea: use a circuit to encode x into a quantum state
|\phi(x)\rangle living in an exponentially large
Hilbert space, and estimate the kernel as the overlap
k(x,x') = \big|\langle \phi(x)\,|\,\phi(x')\rangle\big|^2.
You never write down the feature vector — you just prepare
|\phi(x)\rangle and |\phi(x')\rangle and measure
their overlap. If the feature map is hard to compute classically, the resulting kernel might
separate data no classical kernel can reach. A quantum computer becomes a similarity
measurer, feeding an otherwise ordinary classifier. This is one of the more defensible QML
proposals — but note the target has shifted from "faster" to "a kernel you couldn't compute at all."
Worked example 1 — the data-loading bottleneck
Here is the caveat that undoes more QML speedups than any other, made concrete. HHL and its ML cousins
assume the input \vec{b} already sits in the machine as a quantum state
|b\rangle — its N amplitudes are the
N data values. This is amplitude encoding: it packs
N numbers into just \log_2 N qubits, which sounds
wonderful.
But how do the numbers get in? If you have a fresh classical dataset of
N values and no special hardware, writing them into the amplitudes
one interaction at a time costs on the order of N operations. Watch the
arithmetic collapse:
\underbrace{O(N)}_{\text{load the data}} \;+\; \underbrace{O(\log N)}_{\text{HHL solve}} \;=\; O(N).
The O(\log N) algorithm is real, but you spent O(N)
just getting the data to it — so the end-to-end cost is back to
O(N), and the exponential advantage is gone before the quantum algorithm even
runs. People imagine a device called QRAM that could load data in
O(\log N), but no scalable QRAM has been built; it remains hypothetical. The
speedup lives or dies on an assumption about hardware that may never exist.
Worked example 2 — a quantum-kernel classifier, end to end
Let's trace the kernel method through, high level but concrete. Say each data point is a real number
x, and we choose the feature map that rotates a single qubit by an angle
proportional to x:
|\phi(x)\rangle = \cos\!\big(\tfrac{x}{2}\big)\,|0\rangle + \sin\!\big(\tfrac{x}{2}\big)\,|1\rangle.
To classify a new point we need its similarity to the training points, i.e. the kernel. For two inputs
x and x' the overlap of their feature states is
\langle \phi(x)\,|\,\phi(x')\rangle = \cos\!\big(\tfrac{x}{2}\big)\cos\!\big(\tfrac{x'}{2}\big) + \sin\!\big(\tfrac{x}{2}\big)\sin\!\big(\tfrac{x'}{2}\big) = \cos\!\Big(\tfrac{x-x'}{2}\Big),
so the kernel is
k(x,x') = \big|\langle\phi(x)|\phi(x')\rangle\big|^2 = \cos^2\!\big(\tfrac{x-x'}{2}\big).
Two identical points (x=x') give kernel
1 — perfectly similar; points a half-turn apart give
0. On real hardware you don't compute this cosine — you prepare both
states and estimate the overlap by measurement (e.g. a swap test), running the circuit many times to
beat down the statistical noise. Feed the resulting kernel matrix to an ordinary support vector machine
and you have a classifier. The quantum computer did one job: measuring similarity in a space you never
wrote down. With a richer, many-qubit feature map, that space becomes one no classical kernel can
reproduce — which is where a genuine edge might hide.
The whole pipeline, and where it pinches
Step through a QML computation end to end. Notice that the celebrated quantum speedup lives only in the
middle box — the two ends, loading data in and reading answers out, are where the promise so
often leaks away.
The readout limit
The second pinch point mirrors the first. A quantum algorithm like HHL leaves its answer as a
state |x\rangle whose N amplitudes encode
the solution. But you can't read amplitudes. A measurement returns one random
basis outcome, so recovering all N entries would take
\sim N repetitions — again erasing the \log N
advantage. What you can cheaply extract is a summary statistic: an
expectation \langle x|M|x\rangle, a norm, an overlap. So the algorithm is
only a win when the question you're asking is itself a single number — a prediction, an
average, a classification — not the whole solution vector. Big-data output is exactly what quantum ML
cannot hand back cheaply.
In 2018, then-undergraduate Ewin Tang delivered the field a jolt. She took a
celebrated quantum recommendation-system algorithm — a QML flagship claiming an exponential
speedup — and produced a classical algorithm only polynomially slower, demolishing the claim.
The key insight, now called "dequantization," is about fairness of comparison. Quantum
QML algorithms assume you can cheaply sample from and query the input state. If you grant a
classical algorithm the same sampling access (so-called ℓ²-norm sampling), it can
often mimic the quantum output using randomised linear algebra — no quantum computer required.
A wave of dequantization results followed, matching several QML speedups — recommendation systems,
low-rank PCA, and more — classically under the same assumptions. The moral isn't that QML is
worthless; for genuinely high-rank, sparse, well-conditioned problems a quantum edge may survive. It's
that many advertised "exponential speedups" shrink to merely polynomial once you compare the quantum
algorithm against a classical one given the same generous access to the data. A fair race
starts both runners at the same line.
-
Linear-algebra QML. Many ML primitives (linear systems, PCA, least squares) are
linear algebra; \text{HHL}-style algorithms offer exponential speedups
in principle — conditional on the caveats below.
-
Variational / QNN. A parameterised circuit
U(\vec{\theta}) trained as a model on near-term hardware — trainable
today, but with no proven advantage and a "barren plateau" risk.
-
Quantum kernels. Map data to a quantum state
|\phi(x)\rangle and compute the kernel
k(x,x')=|\langle\phi(x)|\phi(x')\rangle|^2 in a huge Hilbert space,
feeding a support-vector-style classifier.
-
Data loading. Getting N classical numbers into a state
can cost O(N), cancelling an O(\log N)
algorithm; scalable QRAM is hypothetical.
-
Readout. You extract summary statistics
\langle x|M|x\rangle, not the full output; reading all
N entries costs \sim N.
-
Dequantization. Classical algorithms with the same sampling access match several
QML speedups — so the advantage is conditional, not guaranteed.
The single most misleading way to picture QML is as "a GPU, but exponentially faster, that you plug into
your existing training loop." It is not. QML speedups are almost all
conditional: they hold only when you can load data cheaply (usually you can't — the
O(N) data-loading cost bites), when you only need a summary statistic out (not
the full model), and when no classical algorithm with the same data access can keep up (often one can —
dequantization). Miss any of these and the promised exponential edge collapses back to no
advantage at all.
So treat any "quantum will train your neural net a million times faster" claim with deep suspicion. Ask:
How does the data get in? What number are you reading out? Has this been dequantized? A quantum
computer is a fundamentally different device with its own strange strengths — not a faster version of the
hardware you already have.
Step back and the pattern is clear. The bottlenecks — loading and readout — both come from the
interface between a classical world of big data and a quantum machine. So the most promising
place for a genuine quantum advantage is where the data is already quantum: the state
of a molecule from a chemistry experiment, the output of another quantum device, the wavefunction of a
material. There's no O(N) loading cost because nature hands you the state
directly, and recent results prove that for learning about quantum systems, a quantum learner
can need exponentially fewer experiments than any classical one.
The honest summary: QML is a real and exciting field, but its clearest advantage is unlikely to be
"classical big data, but faster." It is far more likely to be learning about quantum states
themselves — a task classical computers are fundamentally, not just quantitatively, worse at.
That is where the hype and the physics finally point the same way.