Replication and Consistency
A single database on a single machine is a comfortable place to live. It has one copy of the truth,
and if you wrap your changes in a
transaction, that copy
moves from one consistent state to the next, atomically, in isolation. Lovely. Now unplug that
machine. Your entire service is gone — every read, every write, until someone drives to the data
centre. And even while it was running, a user in Sydney was waiting 200
milliseconds for every request to cross the Pacific to your one server in Virginia.
The cure for both problems is the same: keep more than one copy. This is
replication — storing the same data on several machines (called
replicas) — and it buys you four things at once:
- Fault tolerance — one replica dies, the others carry on; no data lost.
- Availability — the system keeps answering even while a machine is down or being
upgraded.
- Read scalability — ten replicas can serve ten times the read traffic, because
any of them can answer a read.
- Low latency — put a replica near your users, and their reads travel metres
instead of oceans.
There is no free lunch, of course. The instant you have two copies of a value and someone writes a
new one, the copies disagree until the change propagates. The whole art of this page is a
single question: how much disagreement will you tolerate, and what does the reader see while
the copies catch up? That question is answered by choosing a consistency
model.
The consistency spectrum
Consistency models form a spectrum, from strict-and-slow to loose-and-fast. The two anchors at the
ends are the ones to know cold.
At the strict end is strong consistency, whose gold standard is
linearizability: the system behaves as if there were a single copy of the
data. Every read returns the value of the most recent write, and once a write is acknowledged, no
subsequent reader anywhere can ever see the old value again. Reads and writes appear to take effect
instantaneously, in a single global order that everybody agrees on. This is exactly the single-node
illusion we are trying to preserve across many machines — and it is expensive, because the replicas
must coordinate on every operation before answering.
At the loose end is eventual consistency: if writes stop, all replicas will
eventually converge to the same value — but in the meantime they may hand out different,
stale answers. A write to one replica trickles out to the others in the background; a read that lands
on a replica the update has not yet reached sees the old value. The promise is only about the
destination, not the journey: given enough quiet time, everyone agrees. This is cheap and fast,
because a replica can answer immediately without asking anyone.
- Strong (linearizable) — the system acts like one copy; every read sees the
latest acknowledged write, in one global order. Correct, but requires coordination on every
operation.
- Eventual — replicas may temporarily disagree, but converge once writes stop.
Fast and available, but a reader can see stale data.
Between the extremes live useful half-measures — session guarantees that fix the
most jarring surprises without paying full price:
- Read-your-writes — after you write something, your own later
reads are guaranteed to see it (even if other users briefly do not). No more posting a comment and
watching it vanish on refresh.
- Monotonic reads — once you have seen a value, you never see an older
one on a later read. Time never appears to run backwards for a single reader.
Which one do you actually need?
The honest answer is: it depends on the data, and the same application usually wants both in
different places. Picture two numbers on a social app.
The like counter on a post is a perfect fit for eventual consistency.
If you see 1{,}000 likes and your friend momentarily sees
1{,}001, nobody is harmed; a second later both screens read the same
number. In exchange, the counter is enormously fast and stays up even when machines fail. Trading a
flicker of disagreement for speed and availability is a bargain here.
A bank balance is the opposite. If the ledger says
\$100 and you withdraw \$100, no other replica
may still answer "\$100 available" and let a second withdrawal through.
Money demands strong consistency: every read must reflect the latest committed
write, because a stale read here creates money out of nothing. This is the single-node
ACID guarantee —
atomic, isolated, durable state transitions — and replication must not quietly break it.
So the choice is not "which is better" but "what is the cost of a stale read for this piece of
data?" A wrong like-count costs nothing; a wrong balance costs a lawsuit.
The name scares people into thinking eventual consistency is a euphemism for "sometimes we lose
your data" or "the numbers are just wrong." Neither is true. Every acknowledged write is
durably stored and will reach every replica; nothing is
discarded. The only thing "eventual" describes is timing — there is a brief window where
different replicas are at different points in the same, agreed-upon history, so a read might land on
one that is a beat behind. Once the writes stop, that window closes and everyone converges to the
same value. It is stale-for-a-moment, not lost and not wrong. (Well-designed systems even
bound that moment and reconcile conflicting concurrent writes with rules like last-writer-wins or
mergeable data types, so the convergence is deterministic, not a coin flip.)
Quorums: buying overlap with arithmetic
Here is the elegant trick that lets you tune where you sit on the spectrum, without
coordinating on every operation. Suppose you have N replicas of a value.
Instead of writing to all of them or reading from just one, you pick two numbers:
- W — the write quorum: a write is not acknowledged
until at least W replicas have stored it.
- R — the read quorum: a read asks
R replicas and takes the newest value among the answers.
Now the magic. If we insist that
W + R > N,
then the set of replicas a write touched and the set a read touches must overlap in at least
one replica — there is simply no room for them to be disjoint. And that one shared replica
has the latest write, so the read is guaranteed to see it. This is the pigeonhole
principle wearing a database hat: two sets whose sizes add up to more than the whole cannot avoid
each other.
The figure below makes the overlap literal. Five replicas sit in a ring. A write lands on
W = 3 of them; a later read contacts R = 3 of
them. Because 3 + 3 = 6 > 5, they cannot dodge one another — they share a
replica, and the read sees the write. Step through it:
A worked example, and the tuning dial
Take the common default N = 3 and try a few settings.
- W = 2, R = 2. Then
W + R = 4 > 3 = N. A write reaches 2 of the 3 replicas; a read asks 2 of
the 3. Two pairs drawn from three replicas must share at least one — overlap guaranteed,
so every read sees the latest acknowledged write. This is the classic
strongly-consistent quorum, and it still tolerates one replica being down.
- W = 1, R = 1. Then
W + R = 2 \not> 3. The write might land only on replica A while the
read asks only replica C — no overlap, so the read can return a
stale value. Blazing fast and highly available (you only ever wait for one
replica), but you have opted into eventual consistency.
So W and R are a dial. Crank
R up (read from more replicas) for fresher reads at the cost of slower
reads; crank W up for stronger write guarantees at the cost of slower
writes. The condition W + R > N is the line between "reads may be stale"
and "reads are guaranteed fresh."
There is a second, separate condition worth its own line. To stop two different writes from
both being acknowledged at the same time (which would leave two conflicting "latest" values), we also
want write quorums to overlap each other:
W > \tfrac{N}{2}.
If every write needs a strict majority, two writes cannot both succeed on disjoint sets — one of them
must fail or wait, so there is always a single agreed latest value. With
N = 3 that means W \ge 2; with
N = 5, W \ge 3. Majority quorums are why so many
systems default to odd N.
It is tempting to think "if I add more copies, my data gets more consistent." The opposite is
closer to the truth. Consistency comes from the overlap W + R > N,
not from the raw number of replicas. If you double N from
3 to 6 but keep
W = R = 2, then W + R = 4 \not> 6 — you have
lost your consistency guarantee, because a read and a write can now easily miss each other
among the extra copies. More replicas give you more fault tolerance and more read throughput, but
they force you to raise W and/or R to keep the
same freshness. Replication is a set of dials that trade against each other, not a single "more is
better" slider.
Compute the overlap yourself
The whole guarantee reduces to one question: does the set of replicas a write reached share any
member with the set a read touches? If yes, the read sees the write; if no, it might be stale.
The program below models a few (N, W, R) configurations as explicit sets
of replica indices, checks both the arithmetic test W + R > N and the
actual overlap, and reports whether a stale read is possible. Press Run ▶:
// A quorum system has N replicas (indices 0 .. N-1). A write is stored by
// the replicas in `writeSet`; a later read contacts those in `readSet`.
// The read is guaranteed to see the write ONLY IF the two sets overlap.
function overlaps(writeSet: number[], readSet: number[]): boolean {
return writeSet.some((replica) => readSet.includes(replica));
}
interface Config { n: number; writeSet: number[]; readSet: number[]; }
const configs: Config[] = [
{ n: 3, writeSet: [0, 1], readSet: [1, 2] }, // N=3 W=2 R=2 -> 4 > 3
{ n: 3, writeSet: [0], readSet: [2] }, // N=3 W=1 R=1 -> 2 <= 3, disjoint
{ n: 5, writeSet: [0, 1, 2], readSet: [2, 3, 4] }, // N=5 W=3 R=3 -> 6 > 5
{ n: 5, writeSet: [0, 1], readSet: [3, 4] }, // N=5 W=2 R=2 -> 4 <= 5, disjoint
];
console.log("N W R W+R>N? sets overlap? result");
console.log("-------------------------------------------------------");
for (const c of configs) {
const w = c.writeSet.length;
const r = c.readSet.length;
const arithmeticGuarantee = w + r > c.n; // the W + R > N test
const reallyOverlaps = overlaps(c.writeSet, c.readSet);
const verdict = reallyOverlaps
? "fresh read guaranteed"
: "STALE READ POSSIBLE";
console.log(
c.n + " " + w + " " + r + " " +
(arithmeticGuarantee ? "yes " : "no ") + " " +
(reallyOverlaps ? "yes " : "no ") + " " +
verdict,
);
}
console.log("");
console.log("Whenever W + R > N is 'yes', the sets MUST overlap -> always fresh.");
console.log("When it is 'no', overlap is not guaranteed -> a stale read can slip through.");
Notice the pattern in the output: every row where W + R > N holds also
overlaps — that is the guarantee, not a coincidence. The rows where it fails are exactly the ones
where a stale read can sneak in.