Read write quorums in Paxos

In the Vertical Paxos and The ABCD's of Paxos papers it has been observed that Paxos can be generalized by utilizing different read and write quorums. This idea can be demonstrated with Single Decree Paxos.

How Paxos works

First we'll refresh our knowledge on how Paxos works (for the complete explanation see the How Paxos Works post) and then we'll adapt it to use the idea behind read and write quorums. If we have 4 nodes paxos cluster and want to change a value then we should follow the classic algorithm:

  1. request acceptors to provide the current value
  2. await a quorum (at least 3 nodes) of the acceptor's responses
  3. select the value with max ballot number
  4. perform a transformation of the value
  5. send the new value back to the acceptors
  6. await a quorum (at least 3 nodes) of the acceptor's responses
  7. inform a client that the change was applied

Read write quorums

As you can see, we should achieve a quorum of the acceptor's responses twice. Let us call the first quorum as read quorum and the second as write quorum. In the original Paxos paper the size of those quorums is fixed and equal to $n/2+1$ (where n is the size of the cluster). It turns out that we can have other configurations of the read/write quorums if quorums overlap. For example if read quorum's size is 2 then write quorum's size should be at least 3.

With this property the classic algorithm can be adjusted to:

  1. request acceptors to provide the current value
  2. await a read quorum (at least 2 nodes) of the acceptor's responses
  3. select the value with max ballot number
  4. perform a transformation of the value
  5. send the new value back to the acceptors
  6. await a write quorum (at least 3 nodes) of the acceptor's responses
  7. inform a client that the change was applied
This modification is very important and it will help us to prove the correctness of membership change.

Some might think that this property is hard to prove but actually the proof that I used in the "How Paxos Works" remains the same. Since we don't use the size of read & write quorums and only require them to overlap:

$$(\mathrm{N} := \bar{b}^1.\mathrm{reads}.[\mathrm{node\_id}] \cap \bar{a}^2.\mathrm{writes}.[\mathrm{node\_id}]) \neq \emptyset$$

it works perfectly then we vary the size of the read/write quorums.

Q.E.D.