In 1985, three researchers—Michael Fischer, Nancy Lynch, and Michael Paterson—published a result that would fundamentally reshape how we think about distributed systems. Their theorem, now known simply as FLP, demonstrated something unsettling: in an asynchronous distributed system where even a single process can fail, there exists no deterministic algorithm that is guaranteed to solve consensus.
This wasn’t a limitation of current technology or a gap in our knowledge. It was a mathematical impossibility—a fundamental boundary that no amount of engineering cleverness can overcome. Yet today, distributed databases coordinate across continents, consensus algorithms power everything from cloud infrastructure to blockchain networks, and systems achieve agreement millions of times per second. How do we reconcile this apparent contradiction?
The Consensus Problem: Simpler Than It Sounds
Before understanding why consensus is impossible, we need to understand what consensus actually means. The problem is deceptively simple: a set of processes need to agree on a single value. Each process proposes a value, and eventually all non-faulty processes must decide on the same value.
Formally, any consensus algorithm must satisfy three properties:
- Agreement: No two correct processes decide on different values
- Validity: The decided value must have been proposed by some process
- Termination: All correct processes eventually decide
The first two properties are safety properties—bad things must never happen. The third is a liveness property—good things must eventually happen. FLP shows that in an asynchronous model, you cannot have all three simultaneously if even one process can crash.
What Makes a System “Asynchronous”?
The asynchronous model is not about lack of coordination—it’s about lack of timing assumptions. In an asynchronous system:
- Processes execute at arbitrary speeds
- Messages take arbitrary time to deliver
- There is no shared clock or synchronized time source
This isn’t just theoretical pessimism. Real networks exhibit these properties. A garbage collection pause can freeze a process for seconds. Network congestion can delay messages unpredictably. The synchronous model—where we know exactly how long operations take—is an idealization that rarely matches reality.
The key insight of FLP is that without timing bounds, a crashed process is indistinguishable from a slow one. If you send a message and receive no response, you cannot determine whether the recipient has crashed or is simply delayed. This ambiguity is the root of the impossibility.
The Proof: Bivalent Configurations
The FLP proof relies on the concept of bivalent configurations. A configuration represents the complete state of the system—process states, message queues, and all pending messages. A configuration is:
- 0-valent: Every possible execution from this configuration leads to deciding 0
- 1-valent: Every possible execution from this configuration leads to deciding 1
- Bivalent: Some executions lead to 0, others to 1
The proof proceeds in three lemmas:
Lemma 1: There exists at least one bivalent initial configuration. This follows because if all initial configurations were univalent, a protocol would need to “know” the decision value before any messages are exchanged—which contradicts validity.
Lemma 2: From any bivalent configuration, there exists a step (a message delivery) that leads to another bivalent configuration. The adversary can always choose which message to deliver to keep the system in a bivalent state.
Lemma 3: Starting from a bivalent configuration, an adversary can construct an infinite execution that never reaches a decision.
Initial Config (Bivalent)
│
▼
Message e1 delivered → Still Bivalent
│
▼
Message e2 delivered → Still Bivalent
│
▼
... (continues indefinitely)
Image source: Paper Trail - A Brief Tour of FLP Impossibility
The adversary’s strategy is elegant: always deliver the message that keeps the configuration bivalent. Since the system is asynchronous, there’s no way to force progress—the adversary can always claim the “right” message hasn’t arrived yet.
Circumventing FLP: Three Strategies
If FLP proves consensus is impossible, how do real systems achieve it? The answer lies in weakening assumptions or accepting probabilistic guarantees.
1. Partial Synchrony
In 1988, Dwork, Lynch, and Stockmeyer introduced the partial synchrony model. The key insight: while we may not know the exact timing bounds, they do exist. Messages eventually get delivered, processes eventually make progress.
In this model, consensus is achievable. The system alternates between:
- Good periods: Messages delivered within known bounds
- Bad periods: Unbounded delays (but eventually ends)
Algorithms for partial synchrony achieve safety during bad periods and both safety and liveness during good periods. This matches real-world behavior remarkably well—networks may experience outages, but they eventually recover.
2. Failure Detectors
Chandra and Toueg’s 1996 paper introduced unreliable failure detectors as a way to circumvent FLP. A failure detector is a distributed oracle that provides hints about which processes have crashed.
Two properties define a failure detector:
- Completeness: Every crashed process is eventually suspected
- Accuracy: Correct processes are not suspected (or eventually not suspected)
Interestingly, an unreliable failure detector—one that can make mistakes—suffices for consensus. The $\diamondsuit\mathcal{S}$ (eventually strong) detector only guarantees that after some unknown time, every correct process is not suspected by some correct process. This weak guarantee is enough to break the FLP impossibility.
3. Randomization
Ben-Or’s 1983 randomized consensus algorithm takes a different approach. Instead of deterministic termination, it achieves termination with probability 1. The algorithm may run forever, but the probability of this happening approaches zero.
Randomization works because the adversary cannot predict random choices. If a process flips a coin, the adversary cannot pre-compute a strategy to keep the system bivalent forever. The expected number of rounds for Ben-Or’s algorithm is $O(2^n)$, though improvements have reduced this significantly.
Paxos: The First Practical Algorithm
Leslie Lamport’s Paxos algorithm, published in 1998 after a decade of development, was the first practical solution to distributed consensus. Lamport famously explained the algorithm through a fictional parliament on the Greek island of Paxos—a story that made the protocol memorable but not necessarily clearer.
Paxos operates in two phases:
Phase 1 (Prepare)
- A proposer chooses a proposal number $n$ and sends
Prepare(n)to a majority of acceptors - An acceptor responds with
Promise(n, v)where $v$ is the highest-numbered proposal it has accepted (or null if none)
Phase 2 (Accept)
- If the proposer receives promises from a majority, it sends
Accept(n, v')where $v'$ is either the value from the highest-numbered promise or its own proposed value - Acceptors accept unless they have promised a higher number
Proposer Acceptor 1 Acceptor 2 Acceptor 3
│ │ │ │
│──Prepare(n)────▶│ │ │
│──Prepare(n)─────│────────────▶│ │
│──Prepare(n)─────│─────────────│────────────▶│
│ │ │ │
│◀──Promise(n)────│ │ │
│◀──Promise(n)────│◀────────────│ │
│◀──Promise(n)────│◀────────────│◀────────────│
│ │ │ │
│──Accept(n,v)───▶│ │ │
│──Accept(n,v)────│────────────▶│ │
│──Accept(n,v)────│─────────────│────────────▶│
│ │ │ │
│◀──Accepted──────│ │ │
│◀──Accepted──────│◀────────────│ │
│◀──Accepted──────│◀────────────│◀────────────│
Image source: ResearchGate - Phases in a round of the Paxos algorithm
The key insight is the use of proposal numbers to order competing proposals. If two proposers conflict, the higher proposal number wins. The quorum requirement (majority) ensures that any two quorums intersect, so a new proposer will learn about previously accepted values.
Paxos achieves safety under all conditions—partially synchronous, asynchronous, even during network partitions. Liveness is only guaranteed during good periods when messages are delivered in bounded time.
Raft: Understandability as a Feature
In 2014, Diego Ongaro and John Ousterhout published “In Search of an Understandable Consensus Algorithm.” Their goal was explicit: create a consensus algorithm equivalent to Paxos in fault tolerance and performance, but designed for understandability.
Raft decomposes consensus into three independent subproblems:
- Leader Election: Exactly one leader exists per term
- Log Replication: Leader accepts commands, replicates to followers
- Safety: Committed entries are never overwritten
Each server exists in one of three states: follower, candidate, or leader. Followers respond to leaders and candidates. Candidates request votes during elections. Leaders handle all client requests and coordinate log replication.
┌─────────────┐
│ Follower │
└──────┬──────┘
│ heartbeat timeout
▼
┌─────────────┐
┌───────────│ Candidate │───────────┐
│ └─────────────┘ │
│receives receives │
│majority votes no majority
│ │ │
▼ │ │
┌─────────────┐ │ ┌────┴────┐
│ Leader │◀──┴────────────────────│ timeout │
└──────┬──────┘ └─────────┘
│
│ discovers higher term
▼
┌─────────────┐
│ Follower │
└─────────────┘
The term concept is central to Raft. Each term has at most one leader, and terms are monotonically increasing. When a leader crashes, a new term begins with a new election.
Raft’s leader-centric approach simplifies log replication significantly. Only the leader accepts client requests, and all logs flow from leader to followers. This contrasts with Paxos, where any node can propose values.
When Theory Meets Practice: The 2f+1 Question
A common question in distributed systems: how many nodes do you need to tolerate $f$ failures?
For crash failures (nodes stop responding), the answer is $2f + 1$. A quorum requires $f + 1$ nodes, ensuring that any two quorums intersect. If $f$ nodes can fail, you need enough nodes that a quorum can exist without them.
For Byzantine failures (nodes can behave arbitrarily, including lying), the answer is $3f + 1$. The proof is more complex: a Byzantine node can pretend to send different messages to different nodes. You need enough honest nodes to outvote the Byzantine ones regardless of their behavior.
| Failure Type | Minimum Nodes | Quorum Size |
|---|---|---|
| Crash failures | $2f + 1$ | $f + 1$ |
| Byzantine failures | $3f + 1$ | $2f + 1$ |
Practical Byzantine Fault Tolerance (PBFT), introduced by Castro and Liskov in 1999, was the first practical algorithm for Byzantine consensus. It operates in three phases: pre-prepare, prepare, and commit. The algorithm requires $3f + 1$ replicas and can tolerate $f$ Byzantine faults.
Real-World Implementations
The theoretical foundations translate directly into production systems:
etcd uses Raft to coordinate a distributed key-value store. It powers Kubernetes service discovery and configuration management. Every change to the cluster state must achieve consensus across a majority of nodes.
Apache ZooKeeper implements ZAB (ZooKeeper Atomic Broadcast), a consensus protocol similar to Paxos. ZooKeeper coordinates distributed applications including Kafka, HBase, and Hadoop.
Google’s Chubby service was among the first production systems to use Paxos. Chubby provides distributed locking and low-volume storage, serving as the foundation for Google’s infrastructure.
These systems demonstrate that while FLP proves consensus impossible under strict assumptions, practical systems operate in a world where:
- Timing is eventually bounded
- Failure detectors, while imperfect, provide useful hints
- Randomization, when needed, breaks deadlocks with high probability
The Trade-offs We Accept
Understanding FLP and its workarounds reveals the fundamental trade-offs in distributed systems:
Safety vs. Liveness: During network partitions, you must choose. Systems that prioritize safety (CP in CAP terms) may become unavailable. Systems that prioritize availability (AP) may return inconsistent results.
Complexity vs. Understandability: Paxos achieves consensus but is notoriously difficult to implement correctly. Raft sacrifices some flexibility for clarity, and the trade-off has proven worthwhile—Raft has far more production implementations.
Performance vs. Fault Tolerance: Higher fault tolerance requires larger quorums, which increases latency. A system tolerating $f$ failures needs at least $2f+1$ nodes, and each operation must reach $f+1$ of them.
FLP is not a bug in our mathematics—it’s a feature of the universe we inhabit. In a world where messages can be delayed arbitrarily and processes can pause without warning, perfect consensus is impossible. But by understanding the boundaries, we can build systems that approach perfection closely enough to be useful.
The next time your distributed database commits a transaction across three continents, remember: it’s performing a small miracle, navigating around an impossibility theorem that has stood for forty years.
References
-
Fischer, M. J., Lynch, N. A., & Paterson, M. S. (1985). Impossibility of distributed consensus with one faulty process. Journal of the ACM, 32(2), 374-382.
-
Dwork, C., Lynch, N., & Stockmeyer, L. (1988). Consensus in the presence of partial synchrony. Journal of the ACM, 35(2), 288-323.
-
Lamport, L. (2001). Paxos made simple. ACM Sigact News, 32(4), 18-25.
-
Ongaro, D., & Ousterhout, J. (2014). In search of an understandable consensus algorithm. USENIX Annual Technical Conference, 305-319.
-
Chandra, T. D., & Toueg, S. (1996). Unreliable failure detectors for reliable distributed systems. Journal of the ACM, 43(2), 225-267.
-
Ben-Or, M. (1983). Another advantage of free choice: Completely asynchronous agreement protocols. PODC, 27-30.
-
Castro, M., & Liskov, B. (1999). Practical Byzantine fault tolerance. OSDI, 173-186.
-
Howard, H., Malkhi, D., & Spiegelman, A. (2016). Flexible paxos: Quorum intersection revisited. PODC, 485-494.