Many distributed key value storages don’t support atomic multiple updates e.g. Project Voldemort or older versions of Zookeeper. Bud real world applications have a complex model and a lot of concurrent users so non atomic writes lead to inconsistency. It would be a problem…
Compare-and-set is the mother of many lock-free algorithms (including optimistic transactions), so any distributed storage that claims it supports CAS does atomic multiple updates. How?
Let’s have a look at a classic “real-world” example. We’re working in a bank big enough to use a distributed datastorage to keep its user accounts. The current version of a software transferring money between accounts was written by people who didn’t care about atomicity. Sometimes a money transferring procedure is interrupted during the process and money is withdrawn from one account but never reaches the other. So bank got a lot of lawsuits. We were hired to fix the problem. Hopefully the distributed storage supports CAS, so we’ll use it to fix the problem.
We inherit a very simple model consisting of an account type an id and a balance fields and a straight forward MongoDB-like procedure to change the account-typed objects.
First we add a version
field and guard any change of an account with CAS on that field. It protects from changing an unseen value aka ABA-problem. Since the update operator returns the number of affected records we always know if our operation was successful.
Hereinafter all of our objects have version and any modification to the objects is processed with respect to it. To simplify the description I omit that any modification to any object may be refused but you should keep it in mind.
To resolve the atomicity issue we need to introduce a transaction type to the model…
…and enrich our account type with two field updated
and tx
updated
has the same structure as the value
field, may be null and holds a future value of value
during a transaction
tx
is null or an id of the object representing the current transaction
It is easy to describe the algorithm, but it is harder to describe the algorithm in a way that its correctness is obvious. First I’ll make some statements and definitions about the algorithm. I expect you to return to them after I introduce the algorithm and to say something like “Oh, now I see why it is true” and “Oh, I understand its correctness”.
value
field always contains a state that is or was actualc
is for clean, d
is for dirty uncommitted and dc
is for dirty committedc
stateAn object has the clean state when the object has just been created or when a transaction affecting it successfully passed. A clean object’s fields updated
and tx
are null.
An object has the dirty uncommitted state during a transaction: updated
field contains a new value, tx
refers to the transaction object and the transaction object is in a created
state.
The third state is the dirty committed state. It describes a case when the transaction was committed but hasn’t yet cleaned its utility data: updated
field contains new version of an object, tx
refers to the transaction object and its state is committed
created
stateupdated
field to a new value and tx
to the transaction object’s idcommitted
value
field to the new value, updated
and tx
fields to nullAfter the 4th step the transaction is committed, the rest steps just clean the mess
value
field to updated
value, set updated
and tx
fields to null, save the object to the storage and return the objectfailed
state and saveupdated
and tx
field to null and saveI think it is pretty easy to prove transaction properties - just check that all the statements I made above are true and use them to prove atomicity.
We have just added transactions to a distributed storage with CAS support and saved the bank from going bankrupt.
Back to the reality. I want to warn you to be very carefull about statements of companies behind the distributed storages since the companies may lie, deceive or err. It is better to take a look on the independent studies like Call me maybe series or do it yourself.
List of distributed storages that claims they support CAS: MongoDB, Project Voldemort, Cassandra and HBase.