Picture five computers in five different data centres, wired together over a flaky network. Their one job sounds almost insultingly simple: agree on a single value — say, "the next command to run is withdraw \$50." Not a hard sum, not a clever algorithm. Just agree. And yet this is one of the deepest problems in all of computing, because the machines are unreliable (any of them can crash at the worst possible moment) and the network is unreliable (messages arrive late, out of order, duplicated, or never at all). No node can even tell the difference between "that server crashed" and "that server is fine but its reply is stuck in traffic."
This is the consensus problem, and it is the beating heart of every serious distributed system — the thing that lets a cluster of ordinary, failure-prone machines behave, from the outside, like one flawless machine that never forgets and never contradicts itself. Databases that stay consistent across continents, systems that elect a coordinator, locks that work across a whole cluster — all of them run a consensus protocol underneath. This page is about what consensus demands, why a simple majority is the trick that makes it possible, and how the two famous protocols — the venerable, brain-bending Paxos and its deliberately friendlier cousin Raft — actually reach agreement.
Before we can solve consensus we have to pin down exactly what a correct solution owes us. A group of nodes, each starting with some proposed value, must all decide on one value, and the protocol must guarantee three properties:
The tension between the first and the last of these is the whole story. Agreement says "never, ever be inconsistent," and termination says "always make progress." In a world where machines crash and messages vanish, keeping both at once — and keeping them while nodes are dropping — is exactly what makes consensus a genuinely hard problem, not a homework exercise.
Here is the single most important idea on this page. Whenever the cluster needs to commit to something, it does not wait for everyone to agree — that would grind to a halt the instant one machine crashed. Instead a decision is made by a majority: strictly more than half of the nodes. Such a group is called a quorum.
Why a majority, of all things? Because of one small, beautiful fact of arithmetic:
The proof is a one-liner. Take
That overlapping node is the hero of the whole story. Because any decision requires a majority, and any two majorities share a member, that shared member remembers the earlier decision and refuses to help ratify a conflicting one. No two majorities can ever independently commit to contradictory values — which is precisely the Agreement property, handed to us for free by counting. Every consensus protocol, Paxos and Raft included, is at its core an elaborate way of making sure each committed step passes through a majority.
A happy consequence: with
Astonishingly, yes — and it's a theorem. The FLP impossibility result (Fischer, Lynch and Paterson, 1985) proves that in a fully asynchronous system — one with no bound at all on how long a message can take — no deterministic protocol can guarantee it will always reach consensus if even a single node may crash. The reason is that shared-timeline argument again: with no clock you can trust, a node genuinely cannot tell a crashed peer from a merely slow one, and an adversarial scheduler can keep any protocol dithering forever at exactly the wrong moment.
So how do real systems exist at all? They quietly weaken the assumption. Paxos and Raft never sacrifice safety (they will never decide two different values, ever), but they only guarantee progress when the network behaves reasonably for a while — using timeouts and a leader to break symmetry and keep moving. FLP doesn't say consensus is hopeless; it says you can't get it for free, and the price is a dash of timing you have to assume goes right often enough.
The first protocol to solve this properly — and prove it — was Paxos, described by Leslie Lamport in 1998. Paxos is correct, elegant, and notoriously hard to understand; Lamport's own follow-up paper was cheekily titled "Paxos Made Simple," and it still leaves most readers dizzy. The mechanism, stripped to its bones, hands out three roles (one node often plays all three): proposers suggest values, acceptors vote on them, and learners find out the result.
A value gets chosen in two phases, and every proposal carries a unique, ever-increasing proposal number that acts like a ticket for who goes next:
It works, and the majority-overlap argument is why. But the two-phase dance, the rule about adopting the highest-numbered previously-accepted value, and the ways proposals can duel and stall are subtle enough that engineers kept getting real implementations wrong. That pain is exactly what motivated the next protocol.
In 2014, Diego Ongaro and John Ousterhout published Raft with an unusual explicit goal baked into its design: understandability. Their thesis was that Paxos was so hard to grasp that it was hard to build correctly, so they engineered a protocol that solves the same problem but that a student can actually hold in their head. The core simplification: instead of many proposers duelling, Raft elects one leader, and all agreement flows through that single leader for as long as it lives.
Raft slices time into numbered terms, and a node is always in one of three states:
Every follower runs an election timeout — a countdown that resets each time it hears from the leader (a periodic heartbeat). If a follower's timer ever fires without a peep from the leader, it assumes the leader is gone and springs into action: it becomes a candidate, bumps the term number, votes for itself, and sends RequestVote messages to everyone else. Each node grants at most one vote per term, so if the candidate collects votes from a majority, it wins and becomes leader — then immediately starts sending heartbeats to shut down everyone else's timers. Step through the whole sequence below:
Two details make elections actually terminate. First, each election timeout is chosen randomly from a range, so it's unlikely that two followers time out at the exact same instant and split the vote; whoever fires first usually wins uncontested. Second, if a vote does split (nobody gets a majority), that term simply ends with no leader, everyone waits out another random timeout, and a fresh election runs — and thanks to the randomness, it almost always resolves quickly.
Once elected, the leader is the sole point of contact for client commands. Each command is appended to the leader's log and shipped to the followers via AppendEntries messages. Here is where the majority returns, one last time: the leader considers an entry committed — safe, permanent, applied to the state machine — only once a majority of nodes have stored it. Overlapping quorums guarantee that any future leader must already have every committed entry, so committed history can never be lost or rewritten. A crashed leader just triggers a new election; the term number rises, and the cluster carries on.
Two tempting misconceptions, both wrong, both about the same core idea.
"A single leader is a single point of failure." It feels like one, but it isn't. The leader holds no unique, unrecoverable state — every committed entry lives on a majority of followers. If the leader crashes, the followers' timers fire, an election runs, and a new leader is in charge within a fraction of a second, carrying the full committed log with it. The leadership role is a single point of coordination, but it is replaceable, which is the whole point.
"Consensus needs all the nodes to agree." No — and requiring unanimity would be a disaster. If every node had to vote yes, a single crashed or unreachable machine would freeze the entire cluster, destroying availability. Consensus deliberately needs only a majority, and that is not a compromise — it is the essential feature. The majority is what lets the system keep deciding while nodes are down, and quorum overlap is what keeps it safe anyway. "Everyone agrees" is not stronger than "a majority agrees"; it is strictly more fragile.
A leader election comes down to one bit of arithmetic: did the candidate collect strictly
more than half the votes? The little simulation below tallies votes across a cluster and
declares the candidate elected only when its count clears the majority line
Notice how a tie or a minority yields no leader — the term ends leaderless and Raft simply runs another election after a fresh random timeout. Safety is never in danger: at worst the cluster pauses. It never decides two different things.